Version 19 (modified by 10 years ago) ( diff ) | ,
---|
Initial cpuhrs calculation:
Theoretical cpu*hrs
We have 295,867 sus available for our account. Jobs submitted to normal queue. First batch on 512 cpus for 15 hours. Took a sampling of the four times by using ls -ltr
(beginning, the middle, two at the end) between the selected chombos.
We have 1,440 minutes in a day. So we can calculate out the average rate:
For both of these runs we want to have 200 frames. Taking the difference from what we can see from the first rows of our jobs listed below, that means we have 177 frames left for the hydro case, and 182 for the mhd case. Using our average rates from above, that means it will take approximately 3 days to finish the hydro run, and 5 days to finish the mhd run (based on these values). There are 72 hours in 3 days and 120 hours in 5 days. One can see I'll be under the SU limit for the following cpu*hrs.
512 cores
For the hydro case:
For the hydro case:
Total:
Stampede Run Statistics
Hydro/MHD | Machine | Path | Date Row Editted | MaxLevel | Frames Completed/Total | Filling Fracs | Walltime Taken/Left, (0 days == Ready for Restart) | (1-2) Info, (3) Message allocations | Framerate (hrs/frame), No. cores | Notes |
Hydro | Stampede | /scratch/03517/tg828161/ProductionRuns_WireTurbulence/hydro | 05-11-2015 | 2 | 23/200 | 0.038 0.622 | 3.5 days/0 days | 34.6 gb, 164.9 mb, 56.9 mb | ~0.5hrs/frame, 512 cores | Chombos started May 7th at ~15:46. |
Stampede | /scratch/03517/tg828161/ProductionRuns_WireTurbulence/hydro/Restart | 05-26-2015 | 2 | 27/200 | 0.200 0.800 | 7.4 days/0 days | 164.6 gb 216.4 mb, 98.6 mb | ~1hrs/frame, 1024 cores | Using previous run, going to do a study on number of cpus needed for sus left. | |
Stampede | /scratch/03517/tg828161/ProductionRuns_WireTurbulence/hydro/Restart/Restart | 05-26-2015 | 2 | 38/200 | 0.274 0.921 | 16.8 days/0 days | 197.6 gb 495.7 mb, 222.8 mb | ~2hrs/frame, 512 cores | ||
Stampede | /scratch/03517/tg828161/ProductionRuns_WireTurbulence/hydro/Restart/Restart/Restart | 05-26-2015 | 2 | 46/200 | 0.369 0.898 | 19.7 days/0 days | 246.6 gb 675.3 mb, 276.4 mb | ~2hrs/frame, 512 cores | ||
Stampede | /scratch/03517/tg828161/ProductionRuns_WireTurbulence/hydro/Restart/Restart/Restart/Restart | 05-26-2015 | 2 | 53/200 | 0.369 0.898 | 19.7 days/0 days | 246.6 gb 675.3 mb, 276.4 mb | ~3hrs/frame, 512 cores | ||
Bluestreak | /scratch/madams15/TurbulenceRuns/ProductionRuns/hydro/Restart | 06-02-2015 | 2 | 54/200 | 0.402 0.957 | 2.4 mo/0 days | 357.4 gb 306.5 mb, 238.3 mb | 24 hours, 2048 cores | Transferred simulations from Stampede to BS | |
Bluestreak | /scratch/madams15/TurbulenceRuns/ProductionRuns/hydro/Restart | 06-03-2015 | 2 | 55/200 | 0.402 0.957 | 2.4 mo/0 days | 357.4 gb 306.5 mb, 238.3 mb | 24 hours, 2048 cores | ||
Bluestreak | /scratch/madams15/TurbulenceRuns/ProductionRuns/hydro/Restart/Restart | 06-03-2015 | 2 | 57/200 | 0.414 0.965 | 1.4 mo/0 days | 452.4 gb 159.3 mb, 128.0 mb | 8 hours, 4096 cores | ||
Bluestreak | /scratch/madams15/TurbulenceRuns/ProductionRuns/hydro/Restart/Restart/Restart | 06-09-2015 | 2 | 60/200 | 0.446 0.962 | 29.9 day/0 days | 608.0 gb 112.2 mb, 256.0 mb | 8 hours, 8192 cores | ||
Bluestreak | /scratch/madams15/TurbulenceRuns/ProductionRuns/hydro/Restart/Restart/Restart/Restart/out/level1 | 06-09-2015 | 1 | 87/200 | 0.528 | 5 days/0 days | 155.9 gb, 30.7 mb, 256.0 mb | 1 hours, 8192 cores | May want to visualize to see how the output looks. May want to switch back to 2 levels of amr if the resolution isn't fine enough.. we have enough time other than 4 days. Maybe I'll just do part that is level 1, and then restart from level 1 to 2 later on. | |
Bluestreak | /scratch/madams15/TurbulenceRuns/ProductionRuns/hydro/Restart/Restart/Restart/Restart/out/ | 06-10-2015 | 2 | 61/200 | 0.442 0.963 | 28.2day/.1days | 609.4 gb 107.3 mb, 256.0 mb | , 8192 cores | ||
MHD | Stampede | /scratch/03517/tg828161/ProductionRuns_WireTurbulence/mhd | 05-11-2015 | 2 | 18/200 | 0.091 0.865 | 20.4 days/0 days | 225.3 gb, 585.0 mb, 240.9 mb | ~0.75hrs/frame, 512 cores | Chombos started May 8th at ~9:45. |
Stampede | /scratch/03517/tg828161/ProductionRuns_WireTurbulence/mhd/Restart | 05-26-2015 | 2 | 24/200 | 0.179 0.825 | 26.6 days/0 days | 365.1 gb 909.8 mb, 360.9 mb | ~3hrs/frame, 512 cores | Using previous run, going to do a study on number of cpus needed for sus left. | |
Stampede | /scratch/03517/tg828161/ProductionRuns_WireTurbulence/mhd/Restart/Restart | 05-26-2015 | 2 | 30/200 | 0.215 0.891 | 28.9 days/0 days | 435.3 gb 1.3 gb, 461.0 mb | ~3 hours/frame, 512 cores | ||
Stampede | /scratch/03517/tg828161/ProductionRuns_WireTurbulence/mhd/Restart/Restart/Restart | 05-26-2015 | 2 | 36/200 | 0.215 0.891 | 28.9 days/0 days | 435.3 gb 1.3 gb, 461.0 mb | ~4 hours/frame, 512 cores | ||
Stampede | /scratch/03517/tg828161/ProductionRuns_WireTurbulence/mhd/Restart/Restart/Restart/Restart | 06-02-2015 | 2 | 39/200 | TBA | PENDING | TBA | , 512 cores | Stampede is currently down so I cannot look up the information | |
Bluestreak | /scratch/madams15/TurbulenceRuns/ProductionRuns/mhd/Restart | 06-09-2015 | 1 | 39/200 | 0.258 | never! | 161.6 gb 137.9 mb, 147.9 mb | , 4096 cores | Changed MaxLevel to 1. Crashed with error 1 (Restart request). As of 06-08-2015, I remade the code for 1 lvl, and we'll see if I can restart it. | |
Bluestreak | /scratch/madams15/TurbulenceRuns/ProductionRuns/mhd/out/ (astrobear_mhd_lvl1) | 06-09-2015 | 1 | 0/200 | , 8192 cores | Restart from beginning with executable for level 1 amr… moved the level 2 simulation to /scratch/madams15/TurbulenceRuns/ProductionRuns/mhd/out/level2 |
Errors:
1.
Restart requested processor 3555 requesting restart due to nan in flux wl= 0.546425498507929E+01 0.501485374601183E+01 0.558838624845500E+00 0.244801054917926E+00 0.410574189551385E+01 \ 0.976156705621003E+00 0.153834152549860E+01 0.368305795822671E-01 wr= 0.168565634935704E+03 0.127980402442876E+00 0.256235677712142E+00 0.187803293524624E+00 0.339058630978803E+02 \ 0.976156705621003E+00 0.178492062272036E+02 0.150281967080262E+01 at position = ( 0.1203E+01, 0.2188E-01, 0.5094E+00) processor 3756 requesting restart due to nan in flux wl= 0.546425498507929E+01 0.501485374601183E+01 0.558838624845500E+00 0.244801054917926E+00 0.410574189551385E+01 \ 0.976156705621003E+00 0.153834152549860E+01 0.368305795822671E-01 wr= 0.168565634935704E+03 0.127980402442876E+00 0.256235677712142E+00 0.187803293524624E+00 0.339058630978803E+02 \ 0.976156705621003E+00 0.178492062272036E+02 0.150281967080262E+01 at position = ( 0.1203E+01, 0.2188E-01, 0.4469E+00) Advanced level 0 to tnext= 0.1950E+00 with dt= 0.1325E-15 CFL= 0.7685E-12 max speed= 0.3626E+02 Advanced level 1 to tnext= 0.1950E+00 with dt= 0.6624E-16 CFL= 0.1070E-11 max speed= 0.5050E+02 Advanced level 1 to tnext= 0.1950E+00 with dt= 0.6624E-16 CFL= 0.1070E-11 max speed= 0.5050E+02 Info allocations = 161.6 gb 137.9 mb message allocations = ------ 147.9 mb sweep allocations = ------ 36.0 mb filling fractions = 0.258 Current efficiency = 69% Cell updates/second = 1565 3041 51% Wall Time Remaining = never at frame 39.0 of 200 AMR Speed-Up Factor = 0.9972E+00 Restart requested Min Timestep Reached, Stopping, Please email astrobear_dev@pas.rochester.edu for help