Changes between Version 16 and Version 17 of ScramblerPaper


Ignore:
Timestamp:
05/17/11 15:05:57 (14 years ago)
Author:
Jonathan
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • ScramblerPaper

    v16 v17  
    171171
    172172 == The Super-Gridding experiment ==
    173  One of the goals in designing AstroBEAR was to keep the algorithms simple and to have all data manipulation routines operate on individual patches (or pairs of patches).  However once the stencil dependencies are explicitly stated, it becomes possible to modify the hyperbolic advances to be much more sensitive to the available data. 
     173 One of the goals in designing AstroBEAR was to keep the algorithms simple and to have all data manipulation routines operate on individual patches (or pairs of patches).  However once the stencil dependencies are explicitly stated, it becomes possible to modify the hyperbolic advances to be much more sensitive to the available data.  This can allow each processor to perform computations required by neighboring patches only once - instead of twice and allows for processors to skip computations needed to update coarse cells only to replace those values with data from fine cells.  For small adjacent patches or completely refined grids this can reduce the amount of computation by 50-100%.  To this end we implemented a super-gridding scheme which was basically as follows:
     174 * Collect physically adjacent grids on each processor into super-grids.
     175 * For each supergrid, flag the cells that needed to be updated.
     176 * Using the stencil dependencies work backwards to flag the locations where each stencil piece needs to be calculated
     177 * Then sweep across the supergrid performing only the necessary computations.
     178 
     179 Of course storing global mask arrays over the entire supergrid for each stencil piece is memory intensive - so instead we implemented a sparse mask storage that was essentially a collection of non-intersecting boxes marking the regions to calculate.
     180
     181 We then modified the above algorithm to allow for processors to prioritize computations to better overlap communication with computation as follows:
     182 * Collect physically adjacent grids on each processor into super-grids.
     183 * For each supergrid, flag the cells that needed to be updated.
     184 * Using the stencil dependencies work backwards to flag the locations where each stencil piece needs to be calculated
     185 * Then using the old patches with the previous level's data determine which of those calculations can be performed prior to receiving data from overlaps and begin performing those while waiting for overlap data.
     186 * Determine which fluxes and emf's will need to be synchronized with neighboring processors.
     187 * Work backwards to determine which remaining calculations should be performed next and perform those calculations.
     188 * Send fluxes and then perform remaining calculations that can be done before receiving data from children while waiting for child data.
     189 * Using data from children continue performing calculations that can be done before receiving data from neighbors while waiting for neighbor data.
     190 * Using neighbor data complete all required calculations to advance patches.
     191
     192 
     193 
     194in which each processor assembles its collection of patches into super-grids.  Since the patches assigned to each processor are normally physically close, only one super-grid per processor is usually required.  After constructing each supergrid, each processor then given the sections within the supergrid Within each supergrid each processor then determines what calculations will be needed.  calculations will be   For example, once processors have the location of new patches, they can work out the various calculations that will need to be performed to update the collection of patches as a whole.  They can also work out which of these calculations can be done using data currently available to the processor from the previous advance.  Then while waiting for boundary data from other processors they can begin performing calculations.  After receiving ghost data, each processor can determine which flux calculations will need to be synchronized with neighboring processors and can work backwards to prioritize the stencil calculations that will be needed to calculate those fluxes.  After calculating the fluxes each processor can send the flux data to neighboring processors and can then finish performing the calculations needed to update the rest of the patches.
    174195
    175196 == Performance Results ==