high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Efficient parallel implementation of the lattice Boltzmann method on large clusters of graphic processing units

Efficient parallel implementation of the lattice Boltzmann method on large clusters of graphic processing units

Xiong QinGang, Li Bo, Xu Ji, Fang XiaoJian, Wang XiaoWei, Wang LiMin, He XianFeng, Ge Wei

State Key Laboratory of Multiphase Complex Systems, Institute of Process Engineering, Chinese Academy of Sciences, Beijing 100190, China

Chinese Science Bulletin, 57(7), 707-715, 2012

DOI:10.1007/s11434-011-4908-y

@article{xiong2012efficient,

title={Efficient parallel implementation of the lattice Boltzmann method on large clusters of graphic processing units},

author={Xiong, QG and Li, B. and Xu, J. and others},

journal={Chin Sci Bull},

volume={57},

pages={707–715},

year={2012}

}

Download (PDF)

View

Source

2883

views

Many-core processors, such as graphic processing units (GPUs), are promising platforms for intrinsic parallel algorithms such as the lattice Boltzmann method (LBM). Although tremendous speedup has been obtained on a single GPU compared with mainstream CPUs, the performance of the LBM for multiple GPUs has not been studied extensively and systematically. In this article, we carry out LBM simulation on a GPU cluster with many nodes, each having multiple Fermi GPUs. Asynchronous execution with CUDA stream functions, OpenMP and non-blocking MPI communication are incorporated to improve efficiency. The algorithm is tested for two-dimensional Couette flow and the results are in good agreement with the analytical solution. For both the one- and two-dimensional decomposition of space, the algorithm performs well as most of the communication time is hidden. Direct numerical simulation of a two-dimensional gas-solid suspension containing more than one million solid particles and one billion gas lattice cells demonstrates the potential of this algorithm in large-scale engineering applications. The algorithm can be directly extended to the three-dimensional decomposition of space and other modeling methods including explicit grid-based methods.

Tags: Algorithms, CUDA, Fluid dynamics, GPU cluster, Lattice Boltzmann model, MPI, Numerical simulation, nVidia, Tesla C2050

February 22, 2012 by hgpu

No votes yet.

Please wait...