high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » CUDA » Scalable Lattice Boltzmann Solvers for CUDA GPU Clusters

Scalable Lattice Boltzmann Solvers for CUDA GPU Clusters

Christian Obrecht, Frederic Kuznik, Bernard Tourancheau, Jean-Jacques Roux

EDF R&D, Departement EnerBAT, 77818 Moret-sur-Loing Cedex, France

hal-00931058, (11 June 2014)

DOI:10.1016/j.parco.2013.04.001

@article{obrecht:hal-00931058,

hal_id={hal-00931058},

url={http://hal.archives-ouvertes.fr/hal-00931058},

title={Scalable Lattice Boltzmann Solvers for CUDA GPU Clusters},

author={Obrecht, Christian and Kuznik, Fr{‘e}d{‘e}ric and Tourancheau, Bernard and Roux, Jean-Jacques},

keywords={GPU clusters; CUDA; Lattice Boltzmann method},

language={Anglais},

affiliation={Centre de Thermique de Lyon – CETHIL , Laboratoire d’Informatique de Grenoble – LIG},

pages={259-270},

journal={Parallel Computing},

volume={39},

number={6-7},

audience={internationale},

doi={10.1016/j.parco.2013.04.001},

year={2013},

month={Jun},

pdf={http://hal.archives-ouvertes.fr/hal-00931058/PDF/ACL35.pdf}

}

Download (PDF)

View

Source

2170

views

The lattice Boltzmann method (LBM) is an innovative and promising approach in computational fluid dynamics. From an algorithmic standpoint it reduces to a regular data parallel procedure and is therefore well-suited to high performance computations. Numerous works report efficient implementations of the LBM for the GPU, but very few mention multi-GPU versions and even fewer GPU cluster implementations. Yet, to be of practical interest, GPU LBM solvers need to be able to perform large scale simulations. In the present contribution, we describe an efficient LBM implementation for CUDA GPU clusters. Our solver consists of a set of MPI communication routines and a CUDA kernel specifically designed to handle three-dimensional partitioning of the computation domain. Performance measurement were carried out on a small cluster. We show that the results are satisfying, both in terms of data throughput and parallelisation efficiency.

Tags: CUDA, Fluid dynamics, GPU cluster, Lattice Boltzmann model, MPI, nVidia, OpenMPI, Tesla M2070

June 15, 2014 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Scalable Lattice Boltzmann Solvers for CUDA GPU Clusters

Your response

Recent source codes

Kernel Library for LLM Serving

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Genten: Software for Generalized Tensor Decompositions by Sandia National Laboratories

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

VibeCodeHPC - Multi Agentic Vibe Coding for HPC

Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework

exa-AMD: Exascale Accelerated Materials Discovery

Most viewed papers (last 30 days)

Scalable Lattice Boltzmann Solvers for CUDA GPU Clusters

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)