Changes between Version 17 and Version 18 of ScramblerPaper


Ignore:
Timestamp:
05/17/11 17:05:05 (14 years ago)
Author:
Jonathan
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • ScramblerPaper

    v17 v18  
    185185 * Then using the old patches with the previous level's data determine which of those calculations can be performed prior to receiving data from overlaps and begin performing those while waiting for overlap data.
    186186 * Determine which fluxes and emf's will need to be synchronized with neighboring processors.
    187  * Work backwards to determine which remaining calculations should be performed next and perform those calculations.
     187 * Work backwards to determine which remaining calculations need to be performed prior to sending neighbors data and perform those calculations.
    188188 * Send fluxes and then perform remaining calculations that can be done before receiving data from children while waiting for child data.
    189189 * Using data from children continue performing calculations that can be done before receiving data from neighbors while waiting for neighbor data.
    190190 * Using neighbor data complete all required calculations to advance patches.
    191191
    192  
    193  
    194 in which each processor assembles its collection of patches into super-grids.  Since the patches assigned to each processor are normally physically close, only one super-grid per processor is usually required.  After constructing each supergrid, each processor then given the sections within the supergrid Within each supergrid each processor then determines what calculations will be needed.  calculations will be   For example, once processors have the location of new patches, they can work out the various calculations that will need to be performed to update the collection of patches as a whole.  They can also work out which of these calculations can be done using data currently available to the processor from the previous advance.  Then while waiting for boundary data from other processors they can begin performing calculations.  After receiving ghost data, each processor can determine which flux calculations will need to be synchronized with neighboring processors and can work backwards to prioritize the stencil calculations that will be needed to calculate those fluxes.  After calculating the fluxes each processor can send the flux data to neighboring processors and can then finish performing the calculations needed to update the rest of the patches.
    195 
     192  Unfortunately the computational cost associated with keeping track of all of these logical arrays as well as the additional shuffling of data back and forth became comparable to the savings in the number of reduced stencil computations.  It may be possible, however, to improve the algorithms for managing the sparse logical arrays and to design an efficient algorithm that avoids redundant computations on the same processor for unsplit integration schemes.
     193 
    196194 == Performance Results ==
    197195 
     196  For our weak scaling tests we advected a magnetized cylinder across the domain until it was displaced by 1 cylinder radius.  The size of the cylinder was chosen to give a filling fraction of approximately 12.5% so that in the AMR run, the work load for the first refined level was comparable to the base level.  The resolution of the base grid was adjusted to maintain 64^3^ cells per processor and we found that our weak scaling for both fixed grid and for AMR is reasonable out to 2048 processors.
     197
     198 [[Image(http://www.pas.rochester.edu/~johannjc/Papers/Carroll2011/ScalingResults.png)]]
    198199
    199200 == AMR information about other codes ==