wiki:u/erica/ScalingBluestreak

Version 19 (modified by Erica Kaminski, 10 years ago) ( diff )

Scaling on Bluestreak

Scaling on Bluestreak

Colliding Flows Run - Hydro, Shear Angle 15

  • Extremely poor scaling on Bluestreak
  • Memory errors with more nodes (remedied with different hypre library)
  • Delicate balance between speeding up sims with more nodes, and encountering memory issues that kill the sims (remedied with different hypre library)
Nodes Frames Time to make 1 frame Notes
32 32-39 3 hours wall-time ran out
256 39-48 1 hours memory died 48.4
512 48-49 1 hours memory died 49.5
32 49-50 4 hours walltime ran out

No image "chart_1.png" attached to u/erica/ScalingBluestreak

As the number of nodes increases, more patches are made, leading to more ghost zones. Thus, the global info as reported in standard out (that includes physical and ghost zones) increases with nodes.

No image "chart_2.png" attached to u/erica/ScalingBluestreak

Peak goes down, as the amount of info is distributed over more and more cores.

No image "chart_4.png" attached to u/erica/ScalingBluestreak

Percent efficiency as given in the standard out.

No image "chart_3.png" attached to u/erica/ScalingBluestreak

The time to write to file increases with nodes. This can affect scaling computation (see http://astrobear.pas.rochester.edu/trac/astrobear/wiki/u/erica/ScalingBluestreak for details).

The memory error is given in 2 places: 1) end of astrobear.log, and 2) a 'core' file that is written to the run directory. I am attaching as an example, the memory error reports for Shear15, nodes 256.

Intro:

All tests done with Shear15, ~¼ way through simulation. I am assuming that the other runs will scale similarly. Astrobear was built with hypre library 2.9.0 without global partition.

Method:

In global.data, changed number of frames and restart frame by factor of 10. So final frame went from 200 to 2000, and restart frame was changed from chombo00050.hdf to chombo00500.hdf. Running this simulation then produces frames with dt = 1/10th of original frame. From this I can estimate the actual run time by multiplying the run time I get for a 1/10th dt frame by 10. This allows for faster scaling tests.

Results:

Nodes Start time Frame times Average run time (not including first frame) Avg. time to write to file Frames/hr
32 3:09 3:43, 4:15, 4:49 33 min/ .1 frame 1.9 min .19
128 3:09 3:29, 3:43, 3:57, 4:12 14.3 min/ .1 frame 3.7 min .55
256 3:14 3:29, 3:42, 3:55, 4:09 13.3 min/ .1 frame 6.8 min .83
512 5:17 5:34, 5:47, 6:00, 6.13 13 min/ .1 frame 9.4 min 1.3

In the table, I do not include the first frame when averaging the run time as it seems unusually slow given the additional time to reload the grids upon restart.

Note!!! As I increase the number of nodes, the time to write to file increases as well. Therefore, to get a truer estimate of the run-time for one frame, we need to be careful to remove the write-time before multiplying by 10, and only add it after (see next section).

Frames / hour calculation:

Start with calculating the framerate R. Since my data is given in minutes, R naturally has units of mins/frame. Remember to subtract off the write-time before multiplying by 10, since we are interested in the computation time only for a full time step. We can add the write-time back afterward:

So now we have a framerate in minutes/(full) frame. To convert to hours/frame, divide by 60, and to get to frames/hour, invert R.

32 Nodes Example

An example of how to do this for the 32 node case is as follows,

To get frames/hour, take the inverse R-1. Note — this assumes the write time will be the same at the end of the normal dt time step.

Similarly,

Note the scaling inefficiency — doubling the processors does not cut the run-time in half.

Choosing the best set of runs:

The question is, which combination gets me the most frames per hour? With 3 simulations on 512 nodes, the viable options are 1) 1 job with 512 nodes, 2) 2 jobs with 256 n each, 3) 2 jobs @ 128 n, and a 3rd @ 256. Adding up the total frames per hour for the different options shows that option 3) yields the most frames per hour.

Attachments (4)

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.