Meeting Update

  • Created a work sheet for calculating various run parameters…
X Linear size of entire base grid
N Number of cpu's
D Physical dimension of problem
C Cell updates per cpu per second
T Run time
x Linear size of base grid on each cpu
L number of levels or refinement
R Refinement ratio
F Filling ratios between AMR levels (assumed to be fixed across all levels)
E Ratio of workloads between levels (E=F R(D+1))
  • In general for a given linear resolution X, there will be X cell updates per crossing time - and the number of cells to update will be XD so the total work load for a problem goes as X(D+1). If a single cpu can update C cells/second, then we have C N T = XD+1
  • Now if we divide a domain that is XD into N pieces to distribute, then each piece will have XD/N cells and have a linear dimension x = (XD/N)1/D = X/N1/D

  • Now with weak scaling the number of cells per cpu 'x' is kept constant - so we have XD ~ N. If we were actually doing simulations then the walltime would go as T=(XD+1)/N C ~ X ~ N1/D so the 1,000 core simulation would take 10 times as long (assuming 3D) because there would be 10 times as many time steps though each time step would take the same amount of time since x is constant
  • For strong scaling the goal is to keep X constant so x = X/N(1/D) ~ 1/N(1/D). The number of time steps is unchanged but the time per step goes as xD ~ 1/N so the over all wall time T ~ 1/N. (Of course the memory requirements also go as 1/N so a strong scaling run would take 1000x as much memory on 1 core as 1000 cores.
  • For hybrid scaling the goal is to keep the wall time constant since this is what effectively sets the resolutions we run at. So X(D+1)/N is kept constant so XD+1 ~ N and x ~ X/N1/D ~ N1/D+1/N1/D ~ N1/[D(D+1)] or xD ~ N1/(D+1) which is a fairly weak scaling - so hybrid scaling is very similar to weak scaling - but there is a slight decrease in the workload per cpu because in general more cpus → more time steps → shorter time per step.
  • With hybrid scaling - the invariant is the wall-time which can be chosen intelligently to be 1 day or 1 week … But with strong or weak scaling we have to motivate a choice for x (in the case of weak) or X (in the case of strong). The best way to do this it so choose a target number of cpus and a wall-time. Then you can back out what X and x is for that number of cpus and that wall-time and use those values for the strong and weak scaling respectively.
  • Finally with AMR - if we have a base grid with linear size X, then there will be F XD cells marked for refinement - each of which will create RD child cells that will need to take R substeps - so for each coarse level step there will be F XD RD+1 level 1 steps and since the whole simulation will consist of X coarse steps, this will be a total of F XD+1 RD+1 level 1 steps. And for each level 1 step there will be (F XD RD+1)2 level 2 steps… So for the entire simulation there will be XD+1 (1+F RD+1 + (F RD+1)2 + … + (F RD+1)L) cell updates = XD+1 (1-EL)/(1-E) where E = F RD+1 So if we want to keep the wall-time constant as we add levels of AMR then we need to keep XD+1(1-EL)/(1-E) a constant - so X ~ [(1-E)/(1-EL)]1/(D+1)
  • And we have the 'master equation' N C T = XD+1(1-EL)/(1-E)
  • So in summary:
Weak Scaling fixed #cells per processor
Strong Scaling fixed #cells
Hybrid Scaling fixed wall-time
Hybrid-AMR Scaling fixed wall-time ( where is the filling fraction and is the depth )
  • There is also the issue of grid sizes causing a slow down in computation. In general the overhead required to update a grid of size X reduces the efficiency to XD/(X+M)D where M is of order 4 or 5… If the hyperbolic advance uses up G zones then the extended ghost zone advance will reduce the efficiency to 2 XD/((X+2G+M)D+(X+M)D)… For X of 10 and G of 4 and M of 5 this gives 2000/(233+153)=13% efficiency - without taking into account any latency or communication bottlenecks…. This is just do to having small grids of size 103 - combined with extended ghost zones… If the grids are 203 this becomes 33%… But smaller grids means a lot more overhead - especially as the linear size of a typical grid approaches 10-20 cells
  • Fixed a few threading bugs…
  • Started working on thesis…

Comments

No comments.