OpenACC-based Snow Simulation

Magnus Alvestad Mikalsen
Faculty of Information Technology, Mathematics and Electrical Engineering, Department of Computer and Information Science, Norwegian University of Science and Technology
Norwegian University of Science and Technology, 2013


   title={OpenACC-based Snow Simulation},

   author={Mikalsen, Magnus Alvestad},


   publisher={Institutt for datateknikk og informasjonsvitenskap}


Download Download (PDF)   View View   Source Source   



In recent years, the GPU platform has risen in popularity in high performance computing due to its cost effectiveness and high computing power offered through its many parallel cores. The GPUs computing power can be harnessed using the low-level GPGPU programming APIs CUDA and OpenCL. While both CUDA and OpenCL gives the programmer fine-grained control of a GPUs resources, they are both generally considered difficult to use and can potentially lead to complicated software design. To simplify GPGPU programming and gain more mainstream usage of GPUs, there is an increased interest in moving the complexity of GPGPU programming over to the compiler. This has lead to the development of the directive-based standard for heterogeneous computing called OpenACC, supported by NVIDIA, Cray, PGI, CAPS and others. In this thesis, we explore using OpenACC on a high performance snow simulator code developed by the HPC-Lab at NTNU. The snow simulator consists of two main simulation components; the simulation of wind, and the simulation of snow particle movement. The OpenACC version of the snow simulator is made by first updating the current CUDA version, porting it to a sequential CPU implementation, and applying OpenACC directives to accelerate compute intensive regions in the code. The OpenACC port is also optimized by reducing data movement between host and device using OpenACC library routines. Due to the heterogeneous nature of OpenACC, we show that the inability to explicitly use shared memory as temporary storage and not being able to use texture memory for hardware based interpolation and 3D caching, are the largest performance bottlenecks when comparing to the CUDA version. This is supported by the benchmarks of the OpenACC implementation which is shown to give only 40.6% performance of the CUDA version with an average speedup of 3.2x when scaling the amount of snow particles simulated and using a balanced windfield dimension. When scaling the windfield with constant snow particles 58% of the CUDA performance is reached with an average speedup of 4.84x. The best real-time performance is found at about 1.5M snow particles when using a balanced windfield with about 524K grid cells. Using OpenACC for accelerating high performance graphical simulations can be a viable option if the goal is high code portability, however, when the goal is to achieve the best possible performance, our experience show that it is still better to use the more low-level alternatives CUDA or OpenCL.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: