Posts for the month of July 2010

Fixing Bitten

Bitten is broken and I'm trying to fix it. Anyway, to cut a long story short the slave is checking out the code just fine, but stops without running the tests

[INFO    ] A    examples/elphf/generated/phaseDiffusion/binary.pdf
[INFO    ] A    examples/elphf/generated/phaseDiffusion/quaternary.png
[INFO    ] A    examples/elphf/generated/phaseDiffusion/ternaryAndElectrons.pdf
[INFO    ] A    examples/elphf/phaseDiffusion.py
[INFO    ]  U   .
[INFO    ] Checked out revision 3712.
[DEBUG   ] svn exited with code 0
[INFO    ] Build step checkout completed successfully
[DEBUG   ] Sending POST request to 'http://matforge.org/fipy/builds/1414/steps/'
[DEBUG   ] Server returned error 500: Internal Server Error (no
message available)
[ERROR   ] Exception raised processing step checkout. Reraising HTTP
Error 500: Internal Server Error
[DEBUG   ] Stopping keepalive thread
[DEBUG   ] Keepalive thread exiting.
[DEBUG   ] Keepalive thread stopped
[DEBUG   ] Removing build directory /tmp/bittenSnQIvn/build_trunk_1414
[ERROR   ] HTTP Error 500: Internal Server Error
[DEBUG   ] Removing working directory /tmp/bittenSnQIvn
[INFO    ] Slave exited at 2010-07-28 14:22:36

Observing the masters log file reveals the following error, which seems to occur at the same time when observing the files out put side by side with tail -f. Could this be connected?

2010-07-28 14:26:36,900 Trac[perm] WARNING: perm.permissions() is deprecated and is only present for HDF compatibility
2010-07-28 14:27:45,230 Trac[main] ERROR: 'time'
Traceback (most recent call last):
  File "/usr/local/lib/python2.4/site-packages/Trac-0.11.1-py2.4.egg/trac/web/main.py", line 423, in _dispatch_request
    dispatcher.dispatch(req)
  File "/usr/local/lib/python2.4/site-packages/Trac-0.11.1-py2.4.egg/trac/web/main.py", line 197, in dispatch
    resp = chosen_handler.process_request(req)
  File "/usr/local/lib/python2.4/site-packages/Bitten-0.6dev_r562-py2.4.egg/bitten/master.py", line 93, in process_request
    return self._process_build_step(req, config, build)
  File "/usr/local/lib/python2.4/site-packages/Bitten-0.6dev_r562-py2.4.egg/bitten/master.py", line 229, in _process_build_step
    step.started = int(_parse_iso_datetime(elem.attr['time']))
  File "/usr/local/lib/python2.4/site-packages/Bitten-0.6dev_r562-py2.4.egg/bitten/util/xmlio.py", line 252, in __getitem__
    raise KeyError(name)
KeyError: 'time'

The times are out of whack because matforge is 5 minutes fast. Let me investigate further.

  • Posted: 2010-07-28 14:31 (Updated: 2010-07-28 14:32)
  • Author: wd15
  • Categories: (none)
  • Comments (2)

Analysis of parallel speed ups

In the process of working with James I have tried to analyze some parallel runs more deeply. I should mention that http://www.mcs.anl.gov/~itf/dbpp is proving to be quite a useful text for understanding some of the concepts that I had overlooked. We can write an expression for the time for a given time step based on various aspects of the parallel partioning something like,

TracMath macro processor has detected an error. Please fix the problem before continuing.


The command:

'/usr/bin/pdflatex -interaction=nonstopmode 1e83c829d23f213750e1a09f9fd3425dfcac3080.tex'
failed with the following output:
"This is pdfTeX, Version 3.1415926-1.40.10 (TeX Live 2009/Debian)\nentering extended mode\n(./1e83c829d23f213750e1a09f9fd3425dfcac3080.tex\nLaTeX2e <2009/09/24>\nBabel <v3.8l> and hyphenation patterns for english, usenglishmax, dumylang, noh\nyphenation, loaded.\n(/usr/share/texmf-texlive/tex/latex/base/article.cls\nDocument Class: article 2007/10/19 v1.4h Standard LaTeX document class\n(/usr/share/texmf-texlive/tex/latex/base/size10.clo))\n(/usr/share/texmf-texlive/tex/latex/base/inputenc.sty\n(/usr/share/texmf-texlive/tex/latex/base/utf8.def\n(/usr/share/texmf-texlive/tex/latex/base/t1enc.dfu)\n(/usr/share/texmf-texlive/tex/latex/base/ot1enc.dfu)\n(/usr/share/texmf-texlive/tex/latex/base/omsenc.dfu)))\n(/usr/share/texmf-texlive/tex/latex/cmap/cmap.sty)\n(/usr/share/texmf/tex/latex/cm-super/type1ec.sty\n(/usr/share/texmf-texlive/tex/latex/base/t1cmr.fd))\n(/usr/share/texmf-texlive/tex/latex/base/fontenc.sty\n(/usr/share/texmf-texlive/tex/latex/base/t1enc.def)<<t1.cmap>>)\n(/usr/share/texmf-texlive/tex/latex/amsmath/amsmath.sty\nFor additional information on amsmath, use the `?' option.\n(/usr/share/texmf-texlive/tex/latex/amsmath/amstext.sty\n(/usr/share/texmf-texlive/tex/latex/amsmath/amsgen.sty))\n(/usr/share/texmf-texlive/tex/latex/amsmath/amsbsy.sty)\n(/usr/share/texmf-texlive/tex/latex/amsmath/amsopn.sty))\n(/usr/share/texmf-texlive/tex/latex/amscls/amsthm.sty)\n(/usr/share/texmf-texlive/tex/latex/amsfonts/amssymb.sty\n(/usr/share/texmf-texlive/tex/latex/amsfonts/amsfonts.sty))\n(/usr/share/texmf-texlive/tex/latex/tools/bm.sty)\n(/usr/share/texmf/tex/latex/preview/preview.sty\n(/usr/share/texmf/tex/latex/preview/prtightpage.def))\n(./1e83c829d23f213750e1a09f9fd3425dfcac3080.aux)\nPreview: Fontsize 10pt\nPreview: PDFoutput 1\n<<ot1.cmap>> (/usr/share/texmf-texlive/tex/latex/amsfonts/umsa.fd)\n(/usr/share/texmf-texlive/tex/latex/amsfonts/umsb.fd)\nPreview: Tightpage -32891 -32891 32891 32891\n[1{/var/lib/texmf/fonts/map/pdftex/updmap/pdftex.map}]\n(./1e83c829d23f213750e1a09f9fd3425dfcac3080.aux) )\n!pdfTeX error: /usr/bin/pdflatex (file ecrm0700): Font ecrm0700 at 600 not foun\nd\n ==> Fatal error occurred, no output PDF file produced!\n"
"\nkpathsea: Running mktexpk --mfmode / --bdpi 600 --mag 1+0/600 --dpi 600 ecrm0700\nmkdir: cannot create directory `././.texmf-var': Permission denied\nmktexpk: /usr/share/texmf/web2c/mktexdir /.texmf-var/fonts/pk/ljfour/jknappen/ec failed.\nkpathsea: Appending font creation commands to missfont.log.\n"

where

TracMath macro processor has detected an error. Please fix the problem before continuing.


The command:

'/usr/bin/pdflatex -interaction=nonstopmode 78e0d3c459d1c02c038285c927119b263539da58.tex'
failed with the following output:
"This is pdfTeX, Version 3.1415926-1.40.10 (TeX Live 2009/Debian)\nentering extended mode\n(./78e0d3c459d1c02c038285c927119b263539da58.tex\nLaTeX2e <2009/09/24>\nBabel <v3.8l> and hyphenation patterns for english, usenglishmax, dumylang, noh\nyphenation, loaded.\n(/usr/share/texmf-texlive/tex/latex/base/article.cls\nDocument Class: article 2007/10/19 v1.4h Standard LaTeX document class\n(/usr/share/texmf-texlive/tex/latex/base/size10.clo))\n(/usr/share/texmf-texlive/tex/latex/base/inputenc.sty\n(/usr/share/texmf-texlive/tex/latex/base/utf8.def\n(/usr/share/texmf-texlive/tex/latex/base/t1enc.dfu)\n(/usr/share/texmf-texlive/tex/latex/base/ot1enc.dfu)\n(/usr/share/texmf-texlive/tex/latex/base/omsenc.dfu)))\n(/usr/share/texmf-texlive/tex/latex/cmap/cmap.sty)\n(/usr/share/texmf/tex/latex/cm-super/type1ec.sty\n(/usr/share/texmf-texlive/tex/latex/base/t1cmr.fd))\n(/usr/share/texmf-texlive/tex/latex/base/fontenc.sty\n(/usr/share/texmf-texlive/tex/latex/base/t1enc.def)<<t1.cmap>>)\n(/usr/share/texmf-texlive/tex/latex/amsmath/amsmath.sty\nFor additional information on amsmath, use the `?' option.\n(/usr/share/texmf-texlive/tex/latex/amsmath/amstext.sty\n(/usr/share/texmf-texlive/tex/latex/amsmath/amsgen.sty))\n(/usr/share/texmf-texlive/tex/latex/amsmath/amsbsy.sty)\n(/usr/share/texmf-texlive/tex/latex/amsmath/amsopn.sty))\n(/usr/share/texmf-texlive/tex/latex/amscls/amsthm.sty)\n(/usr/share/texmf-texlive/tex/latex/amsfonts/amssymb.sty\n(/usr/share/texmf-texlive/tex/latex/amsfonts/amsfonts.sty))\n(/usr/share/texmf-texlive/tex/latex/tools/bm.sty)\n(/usr/share/texmf/tex/latex/preview/preview.sty\n(/usr/share/texmf/tex/latex/preview/prtightpage.def))\n(./78e0d3c459d1c02c038285c927119b263539da58.aux)\nPreview: Fontsize 10pt\nPreview: PDFoutput 1\n<<ot1.cmap>> (/usr/share/texmf-texlive/tex/latex/amsfonts/umsa.fd)\n(/usr/share/texmf-texlive/tex/latex/amsfonts/umsb.fd)\nPreview: Tightpage -32891 -32891 32891 32891\n[1{/var/lib/texmf/fonts/map/pdftex/updmap/pdftex.map}]\n(./78e0d3c459d1c02c038285c927119b263539da58.aux) )\n!pdfTeX error: /usr/bin/pdflatex (file ecrm0700): Font ecrm0700 at 600 not foun\nd\n ==> Fatal error occurred, no output PDF file produced!\n"
"\nkpathsea: Running mktexpk --mfmode / --bdpi 600 --mag 1+0/600 --dpi 600 ecrm0700\nmkdir: cannot create directory `././.texmf-var': Permission denied\nmktexpk: /usr/share/texmf/web2c/mktexdir /.texmf-var/fonts/pk/ljfour/jknappen/ec failed.\nkpathsea: Appending font creation commands to missfont.log.\n"

is the number of cells on node $i$ including overlaps,
TracMath macro processor has detected an error. Please fix the problem before continuing.


The command:

'/usr/bin/pdflatex -interaction=nonstopmode 801c0bdb3c93a544e3de0c8f04af267465c5e918.tex'
failed with the following output:
"This is pdfTeX, Version 3.1415926-1.40.10 (TeX Live 2009/Debian)\nentering extended mode\n(./801c0bdb3c93a544e3de0c8f04af267465c5e918.tex\nLaTeX2e <2009/09/24>\nBabel <v3.8l> and hyphenation patterns for english, usenglishmax, dumylang, noh\nyphenation, loaded.\n(/usr/share/texmf-texlive/tex/latex/base/article.cls\nDocument Class: article 2007/10/19 v1.4h Standard LaTeX document class\n(/usr/share/texmf-texlive/tex/latex/base/size10.clo))\n(/usr/share/texmf-texlive/tex/latex/base/inputenc.sty\n(/usr/share/texmf-texlive/tex/latex/base/utf8.def\n(/usr/share/texmf-texlive/tex/latex/base/t1enc.dfu)\n(/usr/share/texmf-texlive/tex/latex/base/ot1enc.dfu)\n(/usr/share/texmf-texlive/tex/latex/base/omsenc.dfu)))\n(/usr/share/texmf-texlive/tex/latex/cmap/cmap.sty)\n(/usr/share/texmf/tex/latex/cm-super/type1ec.sty\n(/usr/share/texmf-texlive/tex/latex/base/t1cmr.fd))\n(/usr/share/texmf-texlive/tex/latex/base/fontenc.sty\n(/usr/share/texmf-texlive/tex/latex/base/t1enc.def)<<t1.cmap>>)\n(/usr/share/texmf-texlive/tex/latex/amsmath/amsmath.sty\nFor additional information on amsmath, use the `?' option.\n(/usr/share/texmf-texlive/tex/latex/amsmath/amstext.sty\n(/usr/share/texmf-texlive/tex/latex/amsmath/amsgen.sty))\n(/usr/share/texmf-texlive/tex/latex/amsmath/amsbsy.sty)\n(/usr/share/texmf-texlive/tex/latex/amsmath/amsopn.sty))\n(/usr/share/texmf-texlive/tex/latex/amscls/amsthm.sty)\n(/usr/share/texmf-texlive/tex/latex/amsfonts/amssymb.sty\n(/usr/share/texmf-texlive/tex/latex/amsfonts/amsfonts.sty))\n(/usr/share/texmf-texlive/tex/latex/tools/bm.sty)\n(/usr/share/texmf/tex/latex/preview/preview.sty\n(/usr/share/texmf/tex/latex/preview/prtightpage.def))\n(./801c0bdb3c93a544e3de0c8f04af267465c5e918.aux)\nPreview: Fontsize 10pt\nPreview: PDFoutput 1\n<<ot1.cmap>> (/usr/share/texmf-texlive/tex/latex/amsfonts/umsa.fd)\n(/usr/share/texmf-texlive/tex/latex/amsfonts/umsb.fd)\nPreview: Tightpage -32891 -32891 32891 32891\n[1{/var/lib/texmf/fonts/map/pdftex/updmap/pdftex.map}]\n(./801c0bdb3c93a544e3de0c8f04af267465c5e918.aux) )\n!pdfTeX error: /usr/bin/pdflatex (file ecrm0700): Font ecrm0700 at 600 not foun\nd\n ==> Fatal error occurred, no output PDF file produced!\n"
"\nkpathsea: Running mktexpk --mfmode / --bdpi 600 --mag 1+0/600 --dpi 600 ecrm0700\nmkdir: cannot create directory `././.texmf-var': Permission denied\nmktexpk: /usr/share/texmf/web2c/mktexdir /.texmf-var/fonts/pk/ljfour/jknappen/ec failed.\nkpathsea: Appending font creation commands to missfont.log.\n"

is the number of overlapping cells on node $i$ and $N_P$ is the total number of nodes. The terms in the equations represent in order

  • the local calculations (should be perfectly parallel in most of fipy outside the solver),
  • the local processor to processor communication (questionable if this actually exists),
  • global communication (probably more likely),
  • calculations that are across the global cells (there should be none of this, very bad for scaling)
  • a fixed penalty independent of the mesh or partioning

To look at the relative influence of each term I did calculations for various grid sizes and number of nodes, recorded the times and fit the data with a least squares fit using the anisotropy problem. In the least squares fit each timing value is weighed equally and the fastest out of 10 time steps is used for $\Delta t$. This is done because luggage has high variability especially when egon is running openmp jobs. Using the attached scripts I get $\alpha=2.9\times10^{-4},\;\beta=-3.4\times10^{-3},\;\gamma=1.7\times10^{-4},\;\delta=-1.1\times10^{-4},\;\epsilon=7.3\times10^{1}$ along with the following plot. The problem with this fit is that two parameters are actually negative (unphysical) and $\epsilon$ is way too big. This is caused because the script is not adjusting the number of crystals based on the box size (less comparative work in the solver per cell with less processors). As a quick fix we can assume the second and fourth terms are negligible and see how the fits looks. We get $\alpha=2.6\times10^{-4},\;\gamma=9.3\times10^{-5},\;\epsilon=3.5\times10^{-1}$ and a new plot.

This needs to be rerun with an updated timing values with the number of crystals increased with the box size.

  • Posted: 2010-07-28 11:55 (Updated: 2010-07-28 16:33)
  • Author: wd15
  • Categories: (none)
  • Comments (1)

Testing Anisotropy Example on a 4000 x 4000 grid

I'm running the anisotropy example on 32 processors on luggage. The phase and theta images seem to make sense after 30 steps. It took 11593 seconds to reach this point.

I also ran a 200 x 200 grid with 1 crystal (which is roughly the area occupied the crystals in the larger simulation). Running on 6 processors on poole it takes about 734 s to do 690 time steps. After 690 time steps the area of the crystal is 1.05 (the area of the box is 25). The initial area of the crystal is 0.05. So do a few calculations

>>> numpy.pi * (0.025 * 5.)**2
0.049087385212340517
>>> numpy.sqrt(0.05 / numpy.pi)
0.126156626101008
>>> numpy.sqrt(1.05 / numpy.pi)
0.57812228852811087
>>> (0.57812228852811087 - 0.126156626101008) / 690
0.00065502269916971432      

and the rate of expansion is 0.00066 per time step. Now for crystals to touch and see structure the crystal must expand a distance of 2.5, which requires 2.5 / 0.00066 ~ 4000 time steps. The large 4000 x 4000 case will take ~18 days to get to a good point. Now we might not need the crystal to move a distance of 2.5, but it will probably slow down as the box gets filled with solid. We don't have 17 days at this point so on this evidence I'm going to scale back to 3000 x 3000 with 225 crystals. I think that will be pretty much guaranteed to show something good in 10 days.

Okay, the 3000 x 3000 case is taking about 100 s a time step, which will take about 5 days to do 4000 time steps. Much more reasonable.

  • Posted: 2010-07-16 18:19 (Updated: 2010-07-16 18:43)
  • Author: wd15
  • Categories: (none)
  • Comments (0)

Testing the divorcePysparse branch

The tables below show wall clock simulation duration data in seconds for source:branches/divorcePysparse@3716 branch and source:trunk@3716 on poole for 10 time steps of source:sandbox/anisotropy.py@3717. The simulations were conducted 5 times on 1, 2 and 4 processors.

source:branches/divorcePysparse@3716

1 2 4
30.80 17.57 10.14
37.29 19.73 9.74
30.51 19.18 9.73
32.23 16.86 10.35
30.27 16.86 10.27

source:trunk@3716

1 2 4
23.03 12.08 7.63
21.46 13.07 7.04
22.91 14.88 6.98
22.74 11.99 7.50
32.59 11.87 7.54
  • Posted: 2010-07-14 15:27 (Updated: 2010-07-14 15:40)
  • Author: wd15
  • Categories: (none)
  • Comments (0)