173 | | One of the goals in designing AstroBEAR was to keep the algorithms simple and to have all data manipulation routines operate on individual patches (or pairs of patches). However once the stencil dependencies are explicitly stated, it becomes possible to modify the hyperbolic advances to be much more sensitive to the available data. |
| 173 | One of the goals in designing AstroBEAR was to keep the algorithms simple and to have all data manipulation routines operate on individual patches (or pairs of patches). However once the stencil dependencies are explicitly stated, it becomes possible to modify the hyperbolic advances to be much more sensitive to the available data. This can allow each processor to perform computations required by neighboring patches only once - instead of twice and allows for processors to skip computations needed to update coarse cells only to replace those values with data from fine cells. For small adjacent patches or completely refined grids this can reduce the amount of computation by 50-100%. To this end we implemented a super-gridding scheme which was basically as follows: |
| 174 | * Collect physically adjacent grids on each processor into super-grids. |
| 175 | * For each supergrid, flag the cells that needed to be updated. |
| 176 | * Using the stencil dependencies work backwards to flag the locations where each stencil piece needs to be calculated |
| 177 | * Then sweep across the supergrid performing only the necessary computations. |
| 178 | |
| 179 | Of course storing global mask arrays over the entire supergrid for each stencil piece is memory intensive - so instead we implemented a sparse mask storage that was essentially a collection of non-intersecting boxes marking the regions to calculate. |
| 180 | |
| 181 | We then modified the above algorithm to allow for processors to prioritize computations to better overlap communication with computation as follows: |
| 182 | * Collect physically adjacent grids on each processor into super-grids. |
| 183 | * For each supergrid, flag the cells that needed to be updated. |
| 184 | * Using the stencil dependencies work backwards to flag the locations where each stencil piece needs to be calculated |
| 185 | * Then using the old patches with the previous level's data determine which of those calculations can be performed prior to receiving data from overlaps and begin performing those while waiting for overlap data. |
| 186 | * Determine which fluxes and emf's will need to be synchronized with neighboring processors. |
| 187 | * Work backwards to determine which remaining calculations should be performed next and perform those calculations. |
| 188 | * Send fluxes and then perform remaining calculations that can be done before receiving data from children while waiting for child data. |
| 189 | * Using data from children continue performing calculations that can be done before receiving data from neighbors while waiting for neighbor data. |
| 190 | * Using neighbor data complete all required calculations to advance patches. |
| 191 | |
| 192 | |
| 193 | |
| 194 | in which each processor assembles its collection of patches into super-grids. Since the patches assigned to each processor are normally physically close, only one super-grid per processor is usually required. After constructing each supergrid, each processor then given the sections within the supergrid Within each supergrid each processor then determines what calculations will be needed. calculations will be For example, once processors have the location of new patches, they can work out the various calculations that will need to be performed to update the collection of patches as a whole. They can also work out which of these calculations can be done using data currently available to the processor from the previous advance. Then while waiting for boundary data from other processors they can begin performing calculations. After receiving ghost data, each processor can determine which flux calculations will need to be synchronized with neighboring processors and can work backwards to prioritize the stencil calculations that will be needed to calculate those fluxes. After calculating the fluxes each processor can send the flux data to neighboring processors and can then finish performing the calculations needed to update the rest of the patches. |