Research on the simulation of PF-LBM model based on MPI+CUDA mixed granularity parallel
College of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China
AIP Advances, Volume 8, Issue 6, 2018
@article{zhu2018research,
title={Research on the simulation of PF-LBM model based on MPI+ CUDA mixed granularity parallel},
author={Zhu, Changsheng and Liu, Jieqiong and Feng, Li and Deng, Xin},
journal={AIP Advances},
volume={8},
number={6},
pages={065017},
year={2018},
publisher={AIP Publishing}
}
A microstructure numerical model is an intensive computational problem, for which the simulation time is too long and the simulation scale is too small. To solve these two problems, in this article, we use MPI+CUDA hybrid particle heterogeneous parallel computing to implement the dendrite growth simulation of a PF-LBM phase-field 3D model. Message Passing Interface (MPI) can be used to conduct coarse granularity division, to break through the limitation of the simulate scale in a single machine. In each node, fine-grained division is implemented by the Compute Unified Device Architecture (CUDA) parallel way to realize the completely parallelism intra-node, and to improve overall computational efficiency. At the same time, in this article, the "pseudo three-dimensional array" programming method is brought up in CUDA programming, and also to improve the CUDA random number generation method, in order to simplify the CUDA array programming and reduce the CUDA random number generation time purposes. Experiments show that at the same simulation scale, the speed-up ratio with 21 nodes MPI+CUDA was 57, which was increased 54% over the 21 nodes MPI. Under the condition of computing efficiency close, the largest simulation scale with 21 nodes MPI+CUDA was 4203, which is 13 times to single GPU. Therefore, the MPI + CUDA hybrid granularity parallel method proposed in this paper also has the advantages of high computational efficiency of the GPU and MPI to expand the simulation scale.
June 24, 2018 by hgpu