Changes between Version 7 and Version 8 of ScramblerPaper


Ignore:
Timestamp:
05/11/11 13:25:14 (14 years ago)
Author:
Jonathan
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • ScramblerPaper

    v7 v8  
    1010
    1111* Distributed Tree/Control Algorithm
    12   * Many if not all current AMR codes are built upon either BoxLib or Paramesh both of which opt to store the entire AMR structure on each processor.  This however requires additional memory and communication.  (7 floats per patch compared to 256 floats for the data (for 2D 4x4 hydro with 2 ghost zones).  While this additional memory requirement is negligible for problems run on typical cpus on 100's of processors, it can be become considerable on 1000's or 10000's of processors.  Since it is expected that efficient memory use and management will be required for HPC (high performance computing) down the road, AstroBEAR uses a distributed tree algorithm in which each processor is only aware of the nodes 'surrounding' to its own nodes.  This includes its coarser parent node and finer children nodes, as well as physically adjacent neighbors and temporally adjacent overlaps. 
    13    * New Nodes
    14     * Create new children and determine on which processor they will go
    15     * Determine which overlapping nodes might have children that would overlap with its own children and send the relevant children info
    16     * Determine which neighboring nodes might have children that would neighbor its own children and send the relevant children info
    17     * Receive new children from neighbors and determine which of the neighbors' children neighbors its own children
    18     * Receive old children from overlaps and determine which of the overlaps' children overlaps with its own children
    19     * For each child that was distributed off processor, send to the child's processor the child's location, and the location of its parent, neighbors, & overlaps.
    20    * Old Nodes
    21     * Determine which overlapping nodes might have new children that would overlap with its own children and send the relevant children info
    22     * Receive new children from overlaps and determine which of the overlaps' children overlaps with its own children
    23     * For each old child that was distributed off processor, send to the child's processor the child's new overlaps.
     12  * Many if not all current AMR codes are built upon either BoxLib or Paramesh both of which opt to store the entire AMR structure on each processor.  This however requires additional memory and communication.  (7 floats per patch compared to 256 floats for the data (for 2D 4x4 hydro with 2 ghost zones).  While this additional memory requirement is negligible for problems run on typical cpus on 100's of processors, it can be become considerable on 1000's or 10000's of processors.  Since it is expected that efficient memory use and management will be required for HPC (high performance computing) down the road, AstroBEAR uses a distributed tree algorithm in which each processor is only aware of the nodes 'surrounding' its own nodes.  This includes its coarser parent node and finer children nodes, as well as physically adjacent neighbors and temporally adjacent preceding and succeeding overlaps.  When new nodes are created, the parent nodes determine what information needs to be shared and with whom and then populates the childs local connections.  The basic algorithm for updating local trees is as follows:
     13
     14  || ||  '''New Nodes'''  ||  '''Old Nodes'''  ||
     15  ||||||   First Step   ||
     16  ||1|| ''Receive'' new nodes along with their parents, neighbors, and '''preceding''' overlaps from parent processors || ''Receive'' new '''succeeding''' overlaps from parent processors ||
     17  ||2|| Create new children and determine on which child processors they will go || ||
     18  ||3|| Determine which remote '''preceding''' nodes might have children that would overlap with its own children and ''send'' the relevant children info || Determine which remote '''succeeding''' nodes might have created children that would overlap with its own children and ''send'' the relevant children info ||
     19  ||4|| Determine which remote '''neighboring''' nodes might have children that would neighbor its own children and ''send'' the relevant children info || ||
     20  ||5|| Determine which local '''neighboring''' nodes have neighboring children || ||
     21  ||6 || Determine which local '''preceding''' nodes have children that overlap with its own || Determine which local '''succeeding''' nodes have children that overlap with its own ||
     22  ||7|| ''Receive'' new children from remote '''neighboring''' nodes and determine which of the neighbors' children neighbors its own children || ||
     23  ||8|| ''Receive'' children from remote '''preceding''' nodes and determine which of the nodes children overlaps with its own || ''Receive'' children from remote '''succeeding''' nodes and determine which of the nodes children overlaps with its own. ||
     24  ||9|| For each remote child, ''send'' the child's info as well as information about its parents, neighbors, & '''preceding''' overlaps. || For each remote child, ''send'' the child's '''succeeding''' overlaps. ||
     25  ||||||   Successive Steps   ||
     26  ||10|| Create new children and determine on which child processor they will go || ||
     27  ||11|| Determine which remote '''neighboring''' nodes might have old/new children that would overlap/neighbor its own new children and ''send'' the relevant children info ||  ||
     28  ||12|| Determine which local '''neighboring''' nodes might have old/new children that would overlap/neighbor its own new children ||  ||
     29  ||13|| ''Receive'' new children from remote '''neighboring''' nodes and determine which of the neighbors' children neighbors/overlaps its new/old children || ||
     30  ||14|| For each '''new''' remote child, ''send'' the child's information, and the information about its parent, neighbors, & '''preceding''' overlaps. || ||
     31  ||15|| For each '''old''' remote child, ''send'' the child's '''succeeding''' overlaps. || ||
     32  |||||| *Note that physically disjoint grids can overlap with each other's ghost zones - so the 1st generation of a node's neighbor's children can overlap with the nodes 2nd generation of children in steps 11-13 ||
     33  |||||| *See the section on distribution for calculation of parent and child processors. ||
    2434
    2535
     
    2737* Threaded MultiLevelAdvance
    2838 * Many if not all current AMR codes tend to perform grid updates across all levels in a prescribed order (0, 1, 2, 2, 1, 2, 2, 0...)  Good performance then requires each level update to be balanced across all processors.  Load balancing each level however, often leads to artificial fragmentation of grids and requires each processor to have grids on every level.  This however, limits the depth of the AMR tree and the resulting simulations are often performed with large base grids with only 1 or 2 levels of AMR.  The benefit of AMR however, is best utilized for scenarios with large dynamic ranges - and having only 2 levels of AMR severely limits the dynamic range that can be followed.  In AstroBEAR we allow each levels grid advances to happen independently so that processors with coarse grids can update those grids while other processors with fine grids update theirs...
    29 [[Image(http://www.pas.rochester.edu/~johannjc/Papers/Carroll2010/figure1.png)]]
     39
     40[[Image(http://www.pas.rochester.edu/~johannjc/Papers/Carroll2011/figure1.png)]]
    3041
    3142* Load Balancing