Multi-GPU Support on Shared Memory System using Directive-based Programming Model
Department of Computer Science, University of Houston, Houston, USA
Scientific Programming, 2014
@article{xu2014multi,
title={Multi-GPU Support on Shared Memory System using Directive-based Programming Model},
author={Xu, Rengan and Tian, Xiaonan and Chandrasekaran, Sunita and Chapman, Barbara},
year={2014}
}
Existing and emerging studies show that using single Graphics Processing Units (GPUs) can lead to obtaining significant performance gains. These devices have tremendous processing capabilities. We should be able to achieve further orders of performance speedup if we use more than just one GPU. Heterogeneous processors consisting of multiple CPUs and GPUs offer immense potential and is often considered as a leading candidate for porting complex scientific applications. Unfortunately programming heterogeneous systems require much more effort than what is required for single traditional systems or even multicore systems. Directive-based programming approaches are being widely adopted since they are easy to use/port/maintain application code. One such popular model is OpenMP that is a portable directive-based shared memory programming model. Similar to OpenMP is OpenACC that is currently being extensively used to port applications to accelerators. However neither of the models provide support for multiple GPUs. A plausible solution is to use combination of OpenMP and OpenACC that forms a hybrid model, however building this model has its own limitations due to lack of necessary compilers’ support. Moreover the model also lacks support for direct device-to-device communication. This is an important issue to tackle especially while using accelerators, since data transfer between host and device can be very expensive. With these as the motivation factors, in this paper, we have proposed and developed programming strategies for heterogeneous systems. One of the strategies we employ is a hybrid model (OpenMP and OpenACC). We critically analyze its applicability. The limitations of this model led to an alternate strategy where we extend OpenACC by proposing and developing extensions that follow a task-based implementation for supporting multiple GPUs. We evaluate our strategies using two case studies and demonstrate its effectiveness.
February 2, 2015 by hgpu