Panda: A Compiler Framework for Concurrent CPU-GPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers
Department of Informatics, University of Oslo, Norway
International Journal of Parallel Programming, 45(3), 2017
@article{sourouri2017panda,
title={Panda: A Compiler Framework for Concurrent CPU+GPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers},
author={Sourouri, Mohammed and Baden, Scott B and Cai, Xing},
journal={International Journal of Parallel Programming},
volume={45},
number={3},
pages={711–729},
year={2017},
publisher={Springer}
}
This paper describes a new compiler framework for heterogeneous 3D stencil computation on GPU clusters. Our framework consists of a simple directive-based programming model and a tightly integrated source-to-source compiler. Annotated with a small number of directives, sequential stencil codes originally written in C can be automatically parallelized for large-scale GPU clusters. The most distinctive feature of the compiler is its capability to generate state-of-the-art hybrid MPI+CUDA+OpenMP code that uses concurrent CPU+GPU computing to unleash the full potential of powerful GPU clusters. At the same time, the auto-generated hybrid codes hide the overhead of various data motion by overlapping them with computation. Test results on the Titan supercomputer and the Wilkes cluster show that auto-translated codes from our compiler can achieve about 90% of the performance of highly optimized handwritten codes, for both a simple stencil benchmark and a real-world application in cardiac modeling. We thus believe that the user-friendliness and performance delivered by our domain-specific compiler framework allow computational scientists to harness the full power of GPU-accelerated supercomputing without painstaking coding effort.
June 21, 2017 by hgpu