First cut of AstroBEAR GPU

http://www.pas.rochester.edu/~shuleli/gpupro/magneticdiffusion_0019.png

Diffusion test movie here:
http://www.pas.rochester.edu/~shuleli/gpupro/magneticdiffusion.gif

Features:
User can specify blocks and threads setup in a data file. In the cuda code, we use simple setup as:
dim3 grid(XBLOCK,YBLOCK,ZBLOCK);
dim3 block(mx/XBLOCK,my/YBLOCK,mz/ZBLOCK);

Can use two different setups: one can use cudaMalloc to compile 3D data into an 1D array, and does calculation on the 1D array (more pressure on global memory since its poor data locality) or use cudaMalloc3D (not working yet, I think I did something wrong when dereferencing PitchPtrs). They use different kernels. It would be ideal to work out the Malloc3D setup.
Launching two different kernels:
JKernel<<<grid,block>>> to calculate edge-centered current
auxKernel<<<grid,block>>> to convert the currents into aux fields and fluxes

Launching multiple kernels seems to have a small overhead (3~5 cycles) unless new data is copied, but it would be nice to explore interblock communication options since we do need all the currents being updated before updating the aux fields. I'm not sure if there is hardware supported barrier on Tesla, but one can easily use an atomic+ on a global memory address to implement a rudimentary barrier between blocks (no sense-reversal needed here since when returning from kernel launch, all blocks will be synchronized and metadata destroyed).

Real time image processing: I have a code written (with a C# front end though) that does real time processing and image rendering on screen on the GPU using openGL. It would be nice to hook this up to the GPU kernel and render simulation results in real time (ideally can have a webapp that take simple user inputs and display simulation results in real time). For the above diffusion test, each frame takes 10ms to compute, 10ms to process (total time though is not 20ms since I spawn two threads one does the computation, the other only gets interrupted when new data available). It could be greater than 16ms if hydrodynamics is involved so some fine tuning is necessary.

This week: will be working on finishing the TSF paper this week, and some job applications. After the paper is out, I may go back to explore the GPU stuff a bit more.

Comments

No comments.