high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Fluid dynamics » MPI Parallelization of GPU-based Lattice Boltzmann Simulations

MPI Parallelization of GPU-based Lattice Boltzmann Simulations

Arash Bakhtiari

Technische Universitat Munchen

Technische Universitat Munchen, 2013

BibTeX

Download (PDF)

View

Source

2767

views

In this thesis, a MPI parallelized LBM code for a Multi-GPU platform has been designed and implemented. The primary goal of the thesis is research on efficient and scalable Multi-GPU LBM code, which exploits advanced features of the modern GPUs, to adopt optimization techniques like overlapping of work and communication in heterogeneous CPU-GPU clusters. In order to achieve the primary goal of the thesis, three overlapping techniques have been designed and implemented. Each of these techniques exploit advanced features of OpenCL API and MPI standard to be able to simultaneously execute independent operations of Multi-GPU LBM simulation. In order to optimize the software and identify the bottlenecks, tools like Callgrind are adopted. Based on the profiling results, three optimization techniques for efficient boundary values memory access pattern on the GPU memory were developed. The overall performance of software has been evaluated on the MAC GPU cluster. In weak scaling experiments on 8 GPUs, the SBK-SCQ has achieved the 97% efficiency by four GPU as baseline but in strong scaling experiments with 8 GPUs, the MBK-SCQ method delivered 2.5 speedup as the best result. In contrast to performance of weak scaling, the overall speedup of the strong scaling is off the line expected from a linear strong scaling results due to the fact of MPI and CPU-GPU communication overheads. Contrary to the expectations, more sophisticated overlapping techniques like MBKMCQ did not achieve better results than simpler techniques such as SBK-SCQ. Techniques like MBK-MCQ suffered from the lack of support for advanced OpenCL features in the driver provided by the vendor. Finally, the Large Eddy Simulation with the Smagorinsky subgrid-scale turbulence model was implemented. By extending the software to this turbulence model, it can be used for simulation of laminar flows as well as turbulent flows on a Multi-GPU distributed memory platform.

Tags: Fluid dynamics, GPU cluster, Heterogeneous systems, Lattice Boltzmann model, MPI, nVidia, OpenCL, Tesla M2090, Thesis

October 27, 2013 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

MPI Parallelization of GPU-based Lattice Boltzmann Simulations

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

MPI Parallelization of GPU-based Lattice Boltzmann Simulations

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)