Changes between Version 17 and Version 18 of ScramblerPaper
- Timestamp:
- 05/17/11 17:05:05 (14 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
ScramblerPaper
v17 v18 185 185 * Then using the old patches with the previous level's data determine which of those calculations can be performed prior to receiving data from overlaps and begin performing those while waiting for overlap data. 186 186 * Determine which fluxes and emf's will need to be synchronized with neighboring processors. 187 * Work backwards to determine which remaining calculations should be performed nextand perform those calculations.187 * Work backwards to determine which remaining calculations need to be performed prior to sending neighbors data and perform those calculations. 188 188 * Send fluxes and then perform remaining calculations that can be done before receiving data from children while waiting for child data. 189 189 * Using data from children continue performing calculations that can be done before receiving data from neighbors while waiting for neighbor data. 190 190 * Using neighbor data complete all required calculations to advance patches. 191 191 192 193 194 in which each processor assembles its collection of patches into super-grids. Since the patches assigned to each processor are normally physically close, only one super-grid per processor is usually required. After constructing each supergrid, each processor then given the sections within the supergrid Within each supergrid each processor then determines what calculations will be needed. calculations will be For example, once processors have the location of new patches, they can work out the various calculations that will need to be performed to update the collection of patches as a whole. They can also work out which of these calculations can be done using data currently available to the processor from the previous advance. Then while waiting for boundary data from other processors they can begin performing calculations. After receiving ghost data, each processor can determine which flux calculations will need to be synchronized with neighboring processors and can work backwards to prioritize the stencil calculations that will be needed to calculate those fluxes. After calculating the fluxes each processor can send the flux data to neighboring processors and can then finish performing the calculations needed to update the rest of the patches. 195 192 Unfortunately the computational cost associated with keeping track of all of these logical arrays as well as the additional shuffling of data back and forth became comparable to the savings in the number of reduced stencil computations. It may be possible, however, to improve the algorithms for managing the sparse logical arrays and to design an efficient algorithm that avoids redundant computations on the same processor for unsplit integration schemes. 193 196 194 == Performance Results == 197 195 196 For our weak scaling tests we advected a magnetized cylinder across the domain until it was displaced by 1 cylinder radius. The size of the cylinder was chosen to give a filling fraction of approximately 12.5% so that in the AMR run, the work load for the first refined level was comparable to the base level. The resolution of the base grid was adjusted to maintain 64^3^ cells per processor and we found that our weak scaling for both fixed grid and for AMR is reasonable out to 2048 processors. 197 198 [[Image(http://www.pas.rochester.edu/~johannjc/Papers/Carroll2011/ScalingResults.png)]] 198 199 199 200 == AMR information about other codes ==