GPUAPI: Multi-level Chapel Runtime API for GPUs
Georgia Institute of Technology, Atlanta, Georgia, USA
The 8th Annual Chapel Implementers and Users Workshop (CHIUW’21), 2021
@article{hayashi2021gpuapi,
title={GPUAPI: Multi-level Chapel Runtime API for GPUs},
author={Hayashi, Akihiro and Paul, Sri Raj and Sarkar, Vivek},
year={2021}
}
Chapel is inherently well suited not only for homogeneous nodes but also heterogeneous nodes because they employ the concept of locales, distributed domains, forall/reduce constructs, and implicit communications. However, it is unfortunate that there is room for further improvements in supporting GPU in Chapel. This paper addresses some of the key limitations of past approaches on mapping Chapel on to GPUs as follows. We introduce the GPUAPI module, which provides multi-level abstractions of existing low-level GPU API such as CUDA runtime API. This module allows Chapel programmers to have the option of explicitly manipulating device memory (de)allocation and data transfer API at the Chapel level while maintaining good performance and productivity. The GPUAPI module is useful particularly when they dive into lower-level details to incrementally evolve their GPU implementations for improved performance on multiple heterogeneous nodes. We provide two tiers of GPU API: the MID-LOW-level API and the MID-level API. The MID-LOW-level API offers thin wrappers for raw GPU API routines, whereas the MID-level API provides Chapel programmer-friendly interface – i.e., allocating device memory using the new keyword. Also, the module allows the coexistence of different levels of API even with the prototype GPU code generator in Chapel 1.24. Our preliminary performance and productivity evaluations show that the use of the GPUAPI module significantly simplifies the manipulation of GPU API in Chapel for multiple CPUs+GPUs nodes while achieving the same performance.
June 20, 2021 by hgpu