A High-productivity Framework for Multi-GPU computation of Mesh-based applications
Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-ku, Tokyo, Japan
First International Workshop on High-Performance Stencil Computations (HiStencils’14), 2014
@inproceedings{shimokawabe.2014.histencils,
author={Shimokawabe, Takashi and Aoki, Takayuki and Onodera, Naoyuki},
title={A High-productivity Framework for Multi-GPU computation of Mesh-based applications},
pages={23–30},
booktitle={Proceedings of the 1st International Workshop on High-Performance Stencil Computations},
editor={Gr{"o}{ss}linger, Armin and K{"o}stler, Harald},
year={2014},
month={Jan},
address={Vienna, Austria}
}
The paper proposes a high-productivity framework for multi-GPU computation of mesh-based applications. In order to achieve high performance on these applications, we have to introduce complicated optimized techniques for GPU computing, which requires relatively-high cost of implementation. Our framework automatically translates user-written functions that update a grid point and generates both GPU and CPU code. In order to execute user code on multiple GPUs, the framework parallelizes this code by using MPI and OpenMP. The framework also provides C++ classes to write GPU-GPU communication effectively. The programmers write user code just in the C++ language and can develop program code optimized for GPU supercomputers without introducing complicated optimizations for GPU computation and GPU-GPU communication. As an experiment evaluation, we have implemented multi-GPU computation of a diffusion equation by using this framework and achieved good weak scaling results. By using peer-to-peer access between GPUs in this framework, the framework-based diffusion computation using two NVIDIA Tesla K20X GPUs is 1.4 times faster than manual implementation code. We also show computational results of the Rayleigh-Taylor instability obtained by 3D compressible flow computation written by this framework.
January 25, 2014 by hgpu