high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » CUDA » Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA

Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA

J. Habich, T. Zeiser, G. Hager, G. Wellein

Erlangen Regional Computing Center (RRZE), University of Erlangen-Nuremberg, Martensstr. 1, 91058 Erlangen, Germany

Advances in Engineering Software (05 March 2011)

DOI:10.1016/j.advengsoft.2010.10.007

@article{Habich2011,

title={“PerformanceanalysisandoptimizationstrategiesforaD3Q19latticeBoltzmannkernelonnVIDIAGPUsusingCUDA”},

journal={“AdvancesinEngineeringSoftware”},

volume={“InPress},

number={“”},

pages={“-“},

year={“2011”},

note={“”},

issn={“0965-9978”},

doi={“DOI:10.1016/j.advengsoft.2010.10.007”},

url={“http://www.sciencedirect.com/science/article/B6V1P-529SW45-1/2/52767e40a0291ebe0a6bd4690c5b42bd”},

author={“J.HabichandT.ZeiserandG.HagerandG.Wellein”},

keywords={“CUDA”}

}

Source

1425

views

This paper presents implementation strategies and optimization approaches for a D3Q19 lattice Boltzmann flow solver on nVIDIA graphics processing units (GPUs). Using the STREAM benchmarks we demonstrate the GPU parallelization approach and obtain an upper limit for the flow solver performance. We discuss the GPU-specific implementation of the solver with a focus on memory alignment and register shortage. The optimized code is up to an order of magnitude faster than standard two-socket x86 servers with AMD Barcelona or Intel Nehalem CPUs. We further analyze data transfer rates for the PCI-express bus to evaluate the potential benefits of multi-GPU parallelism in a cluster environment.

Tags: CUDA, Fluid dynamics, Lattice Boltzmann model, nVidia

March 10, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA

Share this:

Recent source codes

Most viewed papers (last 30 days)