12 | | * Many if not all current AMR codes are built upon either BoxLib or Paramesh both of which opt to store the entire AMR structure on each processor. This however requires additional memory and communication. (7 floats per patch compared to 256 floats for the data (for 2D 4x4 hydro with 2 ghost zones). While this additional memory requirement is negligible for problems run on typical cpus on 100's of processors, it can be become considerable on 1000's or 10000's of processors. Since it is expected that efficient memory use and management will be required for HPC (high performance computing) down the road, AstroBEAR uses a distributed tree algorithm in which each processor is only aware of the nodes 'surrounding' to its own nodes. This includes its coarser parent node and finer children nodes, as well as physically adjacent neighbors and temporally adjacent overlaps. |
13 | | * New Nodes |
14 | | * Create new children and determine on which processor they will go |
15 | | * Determine which overlapping nodes might have children that would overlap with its own children and send the relevant children info |
16 | | * Determine which neighboring nodes might have children that would neighbor its own children and send the relevant children info |
17 | | * Receive new children from neighbors and determine which of the neighbors' children neighbors its own children |
18 | | * Receive old children from overlaps and determine which of the overlaps' children overlaps with its own children |
19 | | * For each child that was distributed off processor, send to the child's processor the child's location, and the location of its parent, neighbors, & overlaps. |
20 | | * Old Nodes |
21 | | * Determine which overlapping nodes might have new children that would overlap with its own children and send the relevant children info |
22 | | * Receive new children from overlaps and determine which of the overlaps' children overlaps with its own children |
23 | | * For each old child that was distributed off processor, send to the child's processor the child's new overlaps. |
| 12 | * Many if not all current AMR codes are built upon either BoxLib or Paramesh both of which opt to store the entire AMR structure on each processor. This however requires additional memory and communication. (7 floats per patch compared to 256 floats for the data (for 2D 4x4 hydro with 2 ghost zones). While this additional memory requirement is negligible for problems run on typical cpus on 100's of processors, it can be become considerable on 1000's or 10000's of processors. Since it is expected that efficient memory use and management will be required for HPC (high performance computing) down the road, AstroBEAR uses a distributed tree algorithm in which each processor is only aware of the nodes 'surrounding' its own nodes. This includes its coarser parent node and finer children nodes, as well as physically adjacent neighbors and temporally adjacent preceding and succeeding overlaps. When new nodes are created, the parent nodes determine what information needs to be shared and with whom and then populates the childs local connections. The basic algorithm for updating local trees is as follows: |
| 13 | |
| 14 | || || '''New Nodes''' || '''Old Nodes''' || |
| 15 | |||||| First Step || |
| 16 | ||1|| ''Receive'' new nodes along with their parents, neighbors, and '''preceding''' overlaps from parent processors || ''Receive'' new '''succeeding''' overlaps from parent processors || |
| 17 | ||2|| Create new children and determine on which child processors they will go || || |
| 18 | ||3|| Determine which remote '''preceding''' nodes might have children that would overlap with its own children and ''send'' the relevant children info || Determine which remote '''succeeding''' nodes might have created children that would overlap with its own children and ''send'' the relevant children info || |
| 19 | ||4|| Determine which remote '''neighboring''' nodes might have children that would neighbor its own children and ''send'' the relevant children info || || |
| 20 | ||5|| Determine which local '''neighboring''' nodes have neighboring children || || |
| 21 | ||6 || Determine which local '''preceding''' nodes have children that overlap with its own || Determine which local '''succeeding''' nodes have children that overlap with its own || |
| 22 | ||7|| ''Receive'' new children from remote '''neighboring''' nodes and determine which of the neighbors' children neighbors its own children || || |
| 23 | ||8|| ''Receive'' children from remote '''preceding''' nodes and determine which of the nodes children overlaps with its own || ''Receive'' children from remote '''succeeding''' nodes and determine which of the nodes children overlaps with its own. || |
| 24 | ||9|| For each remote child, ''send'' the child's info as well as information about its parents, neighbors, & '''preceding''' overlaps. || For each remote child, ''send'' the child's '''succeeding''' overlaps. || |
| 25 | |||||| Successive Steps || |
| 26 | ||10|| Create new children and determine on which child processor they will go || || |
| 27 | ||11|| Determine which remote '''neighboring''' nodes might have old/new children that would overlap/neighbor its own new children and ''send'' the relevant children info || || |
| 28 | ||12|| Determine which local '''neighboring''' nodes might have old/new children that would overlap/neighbor its own new children || || |
| 29 | ||13|| ''Receive'' new children from remote '''neighboring''' nodes and determine which of the neighbors' children neighbors/overlaps its new/old children || || |
| 30 | ||14|| For each '''new''' remote child, ''send'' the child's information, and the information about its parent, neighbors, & '''preceding''' overlaps. || || |
| 31 | ||15|| For each '''old''' remote child, ''send'' the child's '''succeeding''' overlaps. || || |
| 32 | |||||| *Note that physically disjoint grids can overlap with each other's ghost zones - so the 1st generation of a node's neighbor's children can overlap with the nodes 2nd generation of children in steps 11-13 || |
| 33 | |||||| *See the section on distribution for calculation of parent and child processors. || |