3 | | Larger and larger clusters alone will not allow larger and larger simulations to be performed in the same wall time without either an increase in CPU speed or a decrease in the workload of each processor. Since CPU speeds are not expected to keep pace with the requirements of large simulations, the only option is to decrease the work load of each processor. This however requires highly parallelized and efficient algorithms for managing the AMR infrastructure and the necessary computations. Scrambler, unlike many other AMR codes, uses a distributed tree so that each processor is only aware of the AMR structure it needs to be aware of in order to carry out its computations. While currently, this additional memory is small compared to the resources typically available to a CPU, future clusters will have much less memory per processor similar to what is already seen in GPU's. Additionally Scrambler uses a distributed control structure that mirrors the nested AMR grid hierarchy. Processors have child processors just as AMR grids have child grids. These child processors receive new grids, their own new child processors, and all necessary tree information from their parent. This eliminates the need for ANY global communication. Processors only need to communicate with parent processors(processors containing parent grids), neighbor processors (processors containing adjacent grids), overlapping processors (processors containing previous AMR grids that overlap with the processors current grids), and child processors (processors assigned to child grids). This does present a challenge for load balancing – since the total current workload for any given level can only be determined through collective communications after every grid has created children. However, since regions of refinement in AMR simulations typically change slowly – it is possible to use the previous regions of refinement to predict the future regions of refinement and the amount of resources that should be allocated for any given region. This allows the distribution to be parallelized as well. Additionally the allocation of resources among child grids is done using a Hilbert space filling curve. This allows neighboring processors to be physically close on the network (or on the same core) and allows Scrambler to take advantage of the network topology. |
| 3 | |
| 4 | The growing size of scientific simulations can no longer be accommodated simply by increasing the number of nodes in a cluster. Completing larger jobs without increasing the wall time requires a decrease in the workload per processor (i.e., increased parallelism). Unfortunately, increased parallelism often leads to increased communication time. Minimizing the cost of this communication requires efficient parallel algorithms to manage the distributed AMR structure and calculations. |
| 5 | |
| 6 | AstroBEAR's strength lies in its distributed tree structure. Many AMR codes replicate the entire AMR tree on each computational node. This approach incurs a heavy communication cost as the tree continuously broadcasts structural changes to all processors. AstroBEAR, on the other hand, only keeps as much tree information as the local grids need to communicate with their neighbors. This approach saves us memory usage as well as communication time, leaving us well-positioned to take advantage of low-memory architectures such as !BlueGene systems and GPUs. |
| 7 | |
| 8 | Additionally Scrambler uses a distributed control structure that mirrors the nested AMR grid hierarchy. Processors have child processors just as AMR grids have child grids. These child processors receive new grids, their own new child processors, and all necessary tree information from their parent. This eliminates the need for ANY global communication. Processors only need to communicate with parent processors(processors containing parent grids), neighbor processors (processors containing adjacent grids), overlapping processors (processors containing previous AMR grids that overlap with the processors current grids), and child processors (processors assigned to child grids). This does present a challenge for load balancing – since the total current workload for any given level can only be determined through collective communications after every grid has created children. However, since regions of refinement in AMR simulations typically change slowly – it is possible to use the previous regions of refinement to predict the future regions of refinement and the amount of resources that should be allocated for any given region. This allows the distribution to be parallelized as well. Additionally the allocation of resources among child grids is done using a Hilbert space filling curve. This allows neighboring processors to be physically close on the network (or on the same core) and allows Scrambler to take advantage of the network topology. |