A High-productivity Framework for Multi-GPU computation of Mesh-based applications

hgpu.org » Applications » Computer science » A High-productivity Framework for Multi-GPU computation of Mesh-based applications

A High-productivity Framework for Multi-GPU computation of Mesh-based applications

Takashi Shimokawabe, Takayuki Aoki, Naoyuki Onodera

Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-ku, Tokyo, Japan

First International Workshop on High-Performance Stencil Computations (HiStencils’14), 2014

@inproceedings{shimokawabe.2014.histencils,

author={Shimokawabe, Takashi and Aoki, Takayuki and Onodera, Naoyuki},

title={A High-productivity Framework for Multi-GPU computation of Mesh-based applications},

pages={23–30},

booktitle={Proceedings of the 1st International Workshop on High-Performance Stencil Computations},

editor={Gr{"o}{ss}linger, Armin and K{"o}stler, Harald},

year={2014},

month={Jan},

address={Vienna, Austria}

}

Download (PDF)

View

Source

2344

views

The paper proposes a high-productivity framework for multi-GPU computation of mesh-based applications. In order to achieve high performance on these applications, we have to introduce complicated optimized techniques for GPU computing, which requires relatively-high cost of implementation. Our framework automatically translates user-written functions that update a grid point and generates both GPU and CPU code. In order to execute user code on multiple GPUs, the framework parallelizes this code by using MPI and OpenMP. The framework also provides C++ classes to write GPU-GPU communication effectively. The programmers write user code just in the C++ language and can develop program code optimized for GPU supercomputers without introducing complicated optimizations for GPU computation and GPU-GPU communication. As an experiment evaluation, we have implemented multi-GPU computation of a diffusion equation by using this framework and achieved good weak scaling results. By using peer-to-peer access between GPUs in this framework, the framework-based diffusion computation using two NVIDIA Tesla K20X GPUs is 1.4 times faster than manual implementation code. We also show computational results of the Rayleigh-Taylor instability obtained by 3D compressible flow computation written by this framework.

Tags: Computer science, CUDA, Diffusion equation, MPI, nVidia, Tesla K20

January 25, 2014 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org