wiki:u/madams/WTSRunStatistics_Stampede

Version 15 (modified by madams, 10 years ago) ( diff )

Theoretical cpu*hrs

We have 295,867 sus available for our account. Jobs submitted to normal queue. First batch on 512 cpus for 15 hours. Took a sampling of the four times by using ls -ltr (beginning, the middle, two at the end) between the selected chombos.


We have 1,440 minutes in a day. So we can calculate out the average rate:


For both of these runs we want to have 200 frames. Taking the difference from what we can see from the first rows of our jobs listed below, that means we have 177 frames left for the hydro case, and 182 for the mhd case. Using our average rates from above, that means it will take approximately 3 days to finish the hydro run, and 5 days to finish the mhd run (based on these values). There are 72 hours in 3 days and 120 hours in 5 days. One can see I'll be under the SU limit for the following cpu*hrs.

512 cores

For the hydro case:

For the hydro case:

Total:

Stampede Run Statistics

Hydro/MHD Machine Path Date Row Editted MaxLevel Frames Completed/Total Filling Fracs Walltime Taken/Left, (0 days == Ready for Restart) (1-2) Info, (3) Message allocations Framerate (hrs/frame), No. cores Notes
Hydro Stampede /scratch/03517/tg828161/ProductionRuns_WireTurbulence/hydro 05-11-2015 2 23/200 0.038 0.622 3.5 days/0 days 34.6 gb, 164.9 mb, 56.9 mb ~0.5hrs/frame, 512 cores Chombos started May 7th at ~15:46.
Stampede /scratch/03517/tg828161/ProductionRuns_WireTurbulence/hydro/Restart 05-26-2015 2 27/200 0.200 0.800 7.4 days/0 days 164.6 gb 216.4 mb, 98.6 mb ~1hrs/frame, 1024 cores Using previous run, going to do a study on number of cpus needed for sus left.
Stampede /scratch/03517/tg828161/ProductionRuns_WireTurbulence/hydro/Restart/Restart 05-26-2015 2 38/200 0.274 0.921 16.8 days/0 days 197.6 gb 495.7 mb, 222.8 mb ~2hrs/frame, 512 cores
Stampede /scratch/03517/tg828161/ProductionRuns_WireTurbulence/hydro/Restart/Restart/Restart 05-26-2015 2 46/200 0.369 0.898 19.7 days/0 days 246.6 gb 675.3 mb, 276.4 mb ~2hrs/frame, 512 cores
Stampede /scratch/03517/tg828161/ProductionRuns_WireTurbulence/hydro/Restart/Restart/Restart/Restart 05-26-2015 2 53/200 0.369 0.898 19.7 days/0 days 246.6 gb 675.3 mb, 276.4 mb ~3hrs/frame, 512 cores
Bluestreak /scratch/madams15/TurbulenceRuns/ProductionRuns/hydro/Restart 06-02-2015 2 54/200 0.402 0.957 2.4 mo/0 days 357.4 gb 306.5 mb, 238.3 mb 24 hours, 2048 cores Transferred simulations from Stampede to BS
Bluestreak /scratch/madams15/TurbulenceRuns/ProductionRuns/hydro/Restart 06-03-2015 2 55/200 0.402 0.957 2.4 mo/0 days 357.4 gb 306.5 mb, 238.3 mb 24 hours, 2048 cores
Bluestreak /scratch/madams15/TurbulenceRuns/ProductionRuns/hydro/Restart/Restart 06-03-2015 2 57/200 0.414 0.965 1.4 mo/0 days 452.4 gb 159.3 mb, 128.0 mb 8 hours, 4096 cores
Bluestreak /scratch/madams15/TurbulenceRuns/ProductionRuns/hydro/Restart/Restart/Restart 06-03-2015 2 57/200 TBA PENDING TBA 8 hours, 8192 cores
MHD Stampede /scratch/03517/tg828161/ProductionRuns_WireTurbulence/mhd 05-11-2015 2 18/200 0.091 0.865 20.4 days/0 days 225.3 gb, 585.0 mb, 240.9 mb ~0.75hrs/frame, 512 cores Chombos started May 8th at ~9:45.
Stampede /scratch/03517/tg828161/ProductionRuns_WireTurbulence/mhd/Restart 05-26-2015 2 24/200 0.179 0.825 26.6 days/0 days 365.1 gb 909.8 mb, 360.9 mb ~3hrs/frame, 512 cores Using previous run, going to do a study on number of cpus needed for sus left.
Stampede /scratch/03517/tg828161/ProductionRuns_WireTurbulence/mhd/Restart/Restart 05-26-2015 2 30/200 0.215 0.891 28.9 days/0 days 435.3 gb 1.3 gb, 461.0 mb ~3 hours/frame, 512 cores
Stampede /scratch/03517/tg828161/ProductionRuns_WireTurbulence/mhd/Restart/Restart/Restart 05-26-2015 2 36/200 0.215 0.891 28.9 days/0 days 435.3 gb 1.3 gb, 461.0 mb ~4 hours/frame, 512 cores
Stampede /scratch/03517/tg828161/ProductionRuns_WireTurbulence/mhd/Restart/Restart/Restart/Restart 06-02-2015 2 39/200 TBA PENDING TBA , 512 cores Stampede is currently down so I cannot look up the information
Bluestreak /scratch/madams15/TurbulenceRuns/ProductionRuns/mhd/Restart 06-02-2015 1 39/200 TBA PENDING TBA , 4096 cores Changed MaxLevel to 1. Crashed with error 1 (Restart request).

Errors:

1.

 Restart requested
 processor 3555 requesting restart due to nan in flux
 wl=    0.546425498507929E+01    0.501485374601183E+01    0.558838624845500E+00    0.244801054917926E+00    0.410574189551385E+01 \
   0.976156705621003E+00    0.153834152549860E+01    0.368305795822671E-01
 wr=    0.168565634935704E+03    0.127980402442876E+00    0.256235677712142E+00    0.187803293524624E+00    0.339058630978803E+02 \
   0.976156705621003E+00    0.178492062272036E+02    0.150281967080262E+01
at position = (  0.1203E+01,   0.2188E-01,   0.5094E+00)
 processor 3756 requesting restart due to nan in flux
 wl=    0.546425498507929E+01    0.501485374601183E+01    0.558838624845500E+00    0.244801054917926E+00    0.410574189551385E+01 \
   0.976156705621003E+00    0.153834152549860E+01    0.368305795822671E-01
 wr=    0.168565634935704E+03    0.127980402442876E+00    0.256235677712142E+00    0.187803293524624E+00    0.339058630978803E+02 \
   0.976156705621003E+00    0.178492062272036E+02    0.150281967080262E+01
at position = (  0.1203E+01,   0.2188E-01,   0.4469E+00)
 Advanced level  0 to tnext= 0.1950E+00 with dt= 0.1325E-15 CFL= 0.7685E-12 max speed= 0.3626E+02
   Advanced level  1 to tnext= 0.1950E+00 with dt= 0.6624E-16 CFL= 0.1070E-11 max speed= 0.5050E+02
   Advanced level  1 to tnext= 0.1950E+00 with dt= 0.6624E-16 CFL= 0.1070E-11 max speed= 0.5050E+02
 Info allocations    =   161.6 gb  137.9 mb
 message allocations =   ------    147.9 mb
 sweep allocations   =   ------     36.0 mb
 filling fractions   =   0.258
 Current efficiency  =  69%
 Cell updates/second =       1565      3041  51%
 Wall Time Remaining = never       at frame   39.0 of    200
 AMR Speed-Up Factor =       0.9972E+00
 Restart requested
 Min Timestep Reached, Stopping, Please email astrobear_dev@pas.rochester.edu for help

Note: See TracWiki for help on using the wiki.