Changes between Version 13 and Version 14 of ScramblerPaper


Ignore:
Timestamp:
05/16/11 17:03:33 (14 years ago)
Author:
Jonathan
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • ScramblerPaper

    v13 v14  
    4545* Hyperbolic engine
    4646 * Currently AstroBEAR's hyperbolic solver uses a Gudonov type unsplit integrator that utilizes the CTU+CT integration scheme (Stone & Gardiner 08).  Unsplit integrators however, often require many intermediate variables be stored globally during a grid update which can require 10-50x the space required for storing the array of conservative variables.  For GPU type systems, this would considerably restrict the size of a grid that could be updated.  In AstroBEAR we implement a sweep method that essentially pipelines any sequence of calculations into a one dimensional pass across the grid where variables are only stored as long as they are needed.  This is ideally suited for GPU calculations in which a CPU could constantly be sending in new field values and retrieving updated field values while the GPU uses a minimum of memory. 
    47  * The pipelining is done automatically provided that the dependencies between stencil pieces is explicitly stated.  For example consider a simple 2D 1st order Gudonov method in which the initial state is stored in {{{q}}}, and the updated fields are stored in {{{Q}}}.  The x and y fluxes are stored in {{{fx,fy}}}, and the left and right interface states are stored in {{{qLx,qLy}}} and {{{qRx,qRy}}} respectively.  We also adopt the convention that stencil pieces stored on cell edges ( ie {{{qLx, qRx, fx}}}) at position {{{i-1/2}}} are stored in their respective arrays with the index {{{i}}}
     47 * The pipelining is done automatically provided that the dependencies between stencil pieces is explicitly stated.  For example consider a simple 2D 1st order Gudonov method in which the initial state is stored in {{{q}}}, and the updated fields are stored in {{{Q}}}.  The x and y fluxes are stored in {{{fx,fy}}}, and the left and right interface states are stored in {{{qLx,qLy}}} and {{{qRx,qRy}}} respectively.  We also adopt the convention that stencil pieces stored on cell edges ( ie {{{qLx, qRx, fx}}}) at position {{{i-1/2}}} are stored in their respective arrays with the index {{{i}}}.  The stencil dependencies can then be expressed as:
    4848
    4949 ||  Stated dependency  ||  Actual calculation  ||
     
    5959 ||Set_Dependency(Q, fy, (/0,0,0,1/))|| +fy%data(i,j)-fy%data(i,j+1)||
    6060 ||Set_Dependency(Q, q, (/0,0/))|| ||
    61  * Then assuming that we have the size of Q we want updated we can set {{{Q%range=(/1,mx, 1, my, 1, mz/)}}} and work backwards to determine what range of values we need to calculate {{{fx, fy, qRx, qLx, qRy, qLy}}} & {{{q}}}.  To pipeline we first determine windows for each stencil that slide across the grid from left to right.  As the window crosses the stencils range we begin calculated the values for the stencil along the right edge of the window.  We then hang on to the calculated values as long as they are within the stencil's window. 
    6261
     62 * Now given the size of Q we want updated we can work backwards to determine over what range of indices we need to calculate {{{fx, fy, qRx, qLx, qRy, qLy}}} & {{{q}}}.  To pipeline the calculation we then construct a sliding window for each stencil piece that passes across the grid from left to right.  The dependencies determine the window's lead (ie how far in front of updating Q must we calculate the stencil's value) as well as the window's width (ie how long we need to hold on to values for other stencil pieces to use.  For example in the following figure we have a stencil piece {{{s}}} that leads the update of Q by two columns and whose window is 3 cells wide.
    6363
     64[[Image(http://www.pas.rochester.edu/~johannjc/Papers/Carroll2011/SweepWindow.png)]]
    6465
    65 Without pipelining we would then simply do the following:
    66 {{{
    67 FORALL(i=Q%range(1,1):Q%range(1,2), j=Q%range(2,1):Q$range(2,2), k=Q%range(3,1):Q%range(3,2))
    68   Q%data(i,j,k)=q%data(i,j,k)+fx%data(i,j,k)-fx%data(i+1,j,k)+fy%data(i,j,k)-fy%data(i,j+1,k)+fz%data(i,j,k)-fz%data(i,j,k+1)
    69 END FORALL
    70 }}}
    71 To pipeline we have to update one slice in x at a time so instead we have
    72 {{{
    73 DO index = q%range(1,1):q%range(1,2)
     66For the 2D example above this would give the following windows:
    7467
    75   IF istime(q, index, i) THEN
    76     ...
    77   END IF
    78 
    79   ...
    80   ...
    81 
    82 
    83   IF istime(Q, index, i) THEN
    84     FORALL(j=Q%range(2,1):Q%range(2,2), k=Q%range(3,1):Q%range(3,2))
    85         Q%data(Q%x(i),j,k)=q%data(q%x(i),j,k)+fx%data(fx%x(i),j,k)-fx%data(fx%x(i+1),j,k)+fy%data(fy%x(i),j,k)-fy%data(fy%x(i),j+1,k)+fz%data(fz%x(i),j,k)-fz%data(fz%x(i),j,k+1)
    86     END FORALL
    87   END IF
    88 END DO
    89 }}}
    90 where index is the x position within the large array that we are currently in the process of updating.  The {{{%x(:)}}} is an array that maps a local physical x offset with respect to the row we are updating with an i-index in an thin array that is only as wide as is needed.  Finally the istime function returns the index i that represents which position within the Q%data array we should be calculating.  In the above 1D example, {{{q}}} would be three cells wide, {{{fx}}} would be two cells wide, and everyone else would be only 1 cell wide.  {{{q}}} would be retrieved two cycles before updating/storing {{{Q}}}, while {{{qLx}}}, {{{qRx}}}, & {{{fx}}} would be calculated one cycle before.
    91 
    92 The dependencies in each calculation are explicitly stated and then used to determine the physical extent each variable is needed as well as when during the sweep the variable is first calculated and how long each calculated value is stored. 
     68[[Image(http://www.pas.rochester.edu/~johannjc/Papers/Carroll2011/SweepDemo.png)]]
    9369
    9470* Source engine