47 | | * Currently AstroBEAR's hyperbolic solver uses a Gudonov type unsplit integrator that utilizes the CTU+CT integration scheme (Stone & Gardiner 08). Unsplit integrators however, often require many intermediate variables be stored globally during a grid update which can require 10-50x the space required for storing the array of conservative variables. For GPU type systems, this would considerably restrict the size of a grid that could be updated. In AstroBEAR we implement a sweep method that essentially pipelines any sequence of calculations into a one dimensional pass across the grid where variables are only stored as long as they are needed. The dependencies in each calculation are explicitly stated and then used to determine the physical extent each variable is needed as well as when during the sweep the variable is first calculated and how long each calculated value is stored. This is ideally suited for GPU calculations in which a CPU could constantly be sending in new field values and retrieving updated field values while the GPU uses a minimum of memory. |
| 46 | * Currently AstroBEAR's hyperbolic solver uses a Gudonov type unsplit integrator that utilizes the CTU+CT integration scheme (Stone & Gardiner 08). Unsplit integrators however, often require many intermediate variables be stored globally during a grid update which can require 10-50x the space required for storing the array of conservative variables. For GPU type systems, this would considerably restrict the size of a grid that could be updated. In AstroBEAR we implement a sweep method that essentially pipelines any sequence of calculations into a one dimensional pass across the grid where variables are only stored as long as they are needed. This is ideally suited for GPU calculations in which a CPU could constantly be sending in new field values and retrieving updated field values while the GPU uses a minimum of memory. |
| 47 | * The pipelining is done automatically provided that the dependencies between stencil pieces is explicitly stated. For example consider a simple 1D 1st order Gudonov in which the initial state is stored in {{{q}}}, and the updated fields are stored in {{{Q}}}. The fluxes are stored in {{{fx}}}, and the left and right interface states are stored in {{{qLx}}} and {{{qRx}}} respectively. We also adopt the convention that stencil pieces stored on cell edges ({{{qLx, qRx, fx}}}) at position {{{i-1/2}}} are stored in their respective arrays with the index {{{i}}} |
| 48 | |
| 49 | || Stated dependency || Actual calculation || |
| 50 | ||Set_Dependency(qLx, q, (/-1,-1/))||qLx%data(i)=q%data(i-1)|| |
| 51 | ||Set_Dependency(qRx, q, (/0,0/))||qRx%data(i)=q%data(i)|| |
| 52 | ||Set_Dependency(fx, qLx, (/0,0/))|| fx%data(i)=riemann(qLx%data(i),qRx%data(i))|| |
| 53 | ||Set_Dependency(fx, qRx, (/0,0/)) || || |
| 54 | ||Set_Dependency{Q, fx, (/0,1/))|| Q%data(i)=q%data(i)+fx%data(i)-fx%data(i+1) || |
| 55 | ||Set_Dependency(Q, q, (/0,0/))|| || |
| 56 | * Then assuming that we have the size of Q we want updated we can set {{{Q%range=(/1,mx, 1, my, 1, mz/)}}} and work backwards to determine what range of values we need to calculate {{{fx, qRx, qLx,}}} & {{{q}}}. Without pipelining we would then simply do the following: |
| 57 | {{{ |
| 58 | FORALL(i=Q%range(1,1):Q%range(1,2), j=Q%range(2,1):Q$range(2,2), k=Q%range(3,1):Q%range(3,2)) |
| 59 | Q%data(i,j,k)=q%data(i,j,k)+fx%data(i,j,k)-fx%data(i+1,j,k)+fy%data(i,j,k)-fy%data(i,j+1,k)+fz%data(i,j,k)-fz%data(i,j,k+1) |
| 60 | END FORALL |
| 61 | }}} |
| 62 | To pipeline we have to update one slice in x at a time so instead we have |
| 63 | {{{ |
| 64 | DO index = q%range(1,1):q%range(1,2) |
| 65 | |
| 66 | IF istime(q, index, i) THEN |
| 67 | ... |
| 68 | END IF |
| 69 | |
| 70 | ... |
| 71 | ... |
| 72 | |
| 73 | |
| 74 | IF istime(Q, index, i) THEN |
| 75 | FORALL(j=Q%range(2,1):Q%range(2,2), k=Q%range(3,1):Q%range(3,2)) |
| 76 | Q%data(Q%x(i),j,k)=q%data(q%x(i),j,k)+fx%data(fx%x(i),j,k)-fx%data(fx%x(i+1),j,k)+fy%data(fy%x(i),j,k)-fy%data(fy%x(i),j+1,k)+fz%data(fz%x(i),j,k)-fz%data(fz%x(i),j,k+1) |
| 77 | END FORALL |
| 78 | END IF |
| 79 | END DO |
| 80 | }}} |
| 81 | where index is the x position within the large array that we are currently in the process of updating. The {{{%x(:)}}} is an array that maps a local physical x offset with respect to the row we are updating with an i-index in an thin array that is only as wide as is needed. Finally the istime function returns the index i that represents which position within the Q%data array we should be calculating. In the above 1D example, {{{q}}} would be three cells wide, {{{fx}}} would be two cells wide, and everyone else would be only 1 cell wide. {{{q}}} would be retrieved two cycles before updating/storing {{{Q}}}, while {{{qLx}}}, {{{qRx}}}, & {{{fx}}} would be calculated one cycle before. |
| 82 | |
| 83 | The dependencies in each calculation are explicitly stated and then used to determine the physical extent each variable is needed as well as when during the sweep the variable is first calculated and how long each calculated value is stored. |