high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Performance Optimization of 3-D Lattice Boltzmann Flow Solver on a GPU

Performance Optimization of 3-D Lattice Boltzmann Flow Solver on a GPU

Nhat-Phuong Tran, Myungho Lee, Sugwon Hong

Dept. of Computer Science and Engineering, Myongji University, 116 Myongji Ro, Cheo-In Gu, Yong In, Kyung Ki Do, Korea

Scientific Programming, 2016

@article{tran2016performance,

title={Performance Optimization of 3-D Lattice Boltzmann Flow Solver on a GPU},

author={Tran, Nhat-Phuong and Lee, Myungho and Hong, Sugwon},

year={2016}

}

Download (PDF)

View

Source

1487

views

Lattice Boltzmann Method (LBM) is a powerful numerical simulation method of the fluid flow. With its data parallel nature, it is a promising candidate for a parallel implementation on a GPU. The LBM, however, is heavily dataintensive and memory bound. In particular, moving the data to the adjacent cells in the streaming computation phase incurs a lot of uncoalesced accesses on the GPU which affects the overall performance. Furthermore, the main computation kernels of the LBM use a large number of registers per thread which limits the thread parallelism available at the run-time due to the fixed number of registers on the GPU. In this paper, we develop a high performance parallelization of the LBM on a GPU by minimizing the overheads associated with the uncoalesced memory accesses while improving the cache locality using the tiling optimization with the data layout change. Furthermore, we aggressively reduce the register uses for the LBM kernels in order to increase the run-time thread parallelism. Experimental results on the Nvidia Tesla K20 GPU show that our approach delivers impressive throughput performance: 1210.63 Million Lattice Updates Per Second (MLUPS).

Tags: Computer science, CUDA, Fluid dynamics, Lattice Boltzmann model, Numerical simulation, nVidia, Performance, Tesla K20

November 1, 2016 by hgpu

Rating: 5.0/5. From 1 vote.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Performance Optimization of 3-D Lattice Boltzmann Flow Solver on a GPU

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Performance Optimization of 3-D Lattice Boltzmann Flow Solver on a GPU

Share this:

Recent source codes

Most viewed papers (last 30 days)