Version 3 (modified by 12 years ago) ( diff ) | ,
---|
Collect grids into supergrids? Or ghost stencil values between neighboring grids…
- I Computations we can do now
- II Computations we need to do
- III Computations we can do now and need to do eventually
- IV Computations we can't do now but need to do
- V Computations we would need to buffer.
- III = Intersection of I and II
- IV = II-III
- V … Difficult to determine?
Take III and IV and work backward - any value that is used to perform both III and IV should be cached - although we do not need to cache or caculate any value that is only needed to calculate a further cached value… So after determining V we could remove those cells from III or IV before calculating overlaps for lower ranking stencil pieces… However since it is difficult to determine at each point whether III or IV will happen first it is not clear which set to remove the cached values from before continuing to work backward. This will result in some over-caching and additional memory copying - but no additional calculations. If we did now at which x-point the first scan would stop and the second scan would begin - then it would be possible to remove V from IV left of the switch point and remove V from III right of the switch point - and perhaps greatly reduce the amount of buffering??? There is no reason that a switch point could not be determined apriori - although it does reduce the flexibility of overlapping communication with computation. One could have a switch range where within the range V is not removed from III or IV before continuing to work backward - which would give greater scheduling flexibility while reducing the amount of additional buffering…
I can be worked forward (from q to Q) II is worked backward (from Q to q) III and IV follow
V follows from working III and IV backwards and taking the intersection of the two.
What does it mean to work forward or backward? Each stencil piece has a dependency ranking where the initial incoming q and aux fields have a rank of 1 and 2 - and the final updated q and aux fields have the highest ranks. Any stencil dependency will always involve a lower rank stencil piece. And the offsets determine the geometry of that dependency. So we first store the places we can 'calculate' or copy q. Then we look at the next higher stencil piece and determine where we can calculate it based on where q is available and the stencil geometry.
Working backwards is similar but now we start with the final updated Q and Aux fields and determine what dependent calculations there are. Then for each of those calculations determine what dependent calculations they have etc… Of course we could do this all with boxes - that would essentially represent sparse logical arrays… but for now we will just use logical arrays.
We can start by calculating and storing I for each stencil piece. Then as we work backwards from II we can discard I and immediately calculate III and IV and V. Once we have V we can break up the cells into boxes using the NewSubGrids algorithm with a filling fraction of 1. We could also do the same thing for III and IV before we begin performing calculations…
Then we simply sweep across the grid twice - the III sweep is the pre-ghost sweep and the IV sweep is the post-ghost sweep. Both sweeps would need there own sweep buffer since the III sweep will be interrupted at some point… The IV sweep will then take over - and until the IV sweep reaches the point where the III sweep stopped - it can expect the data to be cached from the V boxes. (It therefore needs to restore the cached values into the sweep buffer and modify the regions it calculates to avoid recalculating cached values) Once the IV sweep passes the III sweep it now needs to cache values in the V boxes. Once the IV sweep finishes and posts the communication - the III sweep can resume - although now it will need to restore cached values into its sweep buffer and modify the regions to perform calculations to avoid re-calculating cached values.
With AMR the ghost zones change - but that would only modify the calculation of the II cells. Cells that will get updated on another processor and be ghosted later do not need to be updated here. Extended ghost regions that are not reproduced on another processor will still need to be updated. ChildMask allows for the supersweep to determine whether or not to include that cell.
- Group info structures into supergrid(s)
- For each stencil of each supergrid - allocate a Can Do logical array (I array)
- Fill the 'q' I array from local overlap grids.
- Work backwards filling the 'w', 'f2x', I arrays
- Once we have all of the I arrays, we can calculate the needtodo Q II array (and AUX II array) using the current grids, and then using the dependencies mark the II arrays for lower ranking stencil pieces. Then Working backwards through the stencil pieces we do the following:
- Mark the II cells for all depending stencil pieces
- Calculate the III cells and the IV cells from our own I and II cells
- Calculate the V cells from the III and IV cells
- Store the logical cells for III IV and V in boxes.
Data Structures
- Dependencies(
- Stencil a
- Stencil b
- range
- Stencils(
- Dependency lists?
- range - deprecated
- rank
- SuperGrid
- SuperStencil(24)
- SuperStencil
- I array
- II array
- III box list
- IV box list
- V bufferlist
- III data buffer
- IV data buffer