Parallel Processing of the Building-Cube Method on a GPU Platform
Cyberscience Center, Tohoku University, Sendai 980-8578, Japan
Computers & Fluids (06 January 2011)
@article{Komatsu2011,
title={“ParallelProcessingoftheBuilding-CubeMethodonaGPUPlatform”},
journal={“Computers&Fluids”},
volume={“InPress},
number={“”},
pages={“-“},
year={“2011”},
note={“”},
issn={“0045-7930”},
doi={“DOI:10.1016/j.compfluid.2010.12.019”},
url={“http://www.sciencedirect.com/science/article/B6V26-51WD0G3-1/2/7a998c6fef2b8b626815c2016661f2b0”},
author={“KazuhikoKomatsuandTakashiSogaandRyusukeEgawaandHiroyukiTakizawaandHiroakiKobayashiandShunTakahashiandDaisukeSasakiandKazuhiroNakahashi”},
keywords={“Building-Cube Method”,”GPGPU”,”Multiple GPUs”}
}
The Building-Cube Method (BCM) based on equally-spaced Cartesian meshes has been proposed as a next generation CFD method. Due to the equally-spaced meshes, it is well suited for highly parallel computation. This paper proposes a parallel implementation scheme of BCM on a GPU cluster system, which needs efficient hierarchical parallel processing to exploit the potential of the cluster system. The proposed scheme employs the Red-Black SOR method for the pressure calculations, which is the most time-consuming part of BCM, to obtain massive data parallelism of BCM. By exploiting the coarse-grain and fine-grain parallelism of BCM, the proposed scheme hierarchically assigns equally-divided tasks into the GPU cluster system. Furthermore, to exploit the computational power of GPUs in the cluster system, the proposed scheme employs an efficient data management such as coalesced data transfer and reusing data on an on-chip memory. Experimental results show that the single GPU implementation can achieve about three times higher performance than the single CPU one. Moreover, the multiple GPU implementation can achieve an almost ideal scalability. Finally, the possibility of further acceleration of not only the pressure calculation but also the whole BCM is discussed.
January 14, 2011 by hgpu