Multi-GPU Support on Shared Memory System using Directive-based Programming Model

hgpu.org » Applications » Computer science » Multi-GPU Support on Shared Memory System using Directive-based Programming Model

Multi-GPU Support on Shared Memory System using Directive-based Programming Model

Rengan Xu, Xiaonan Tian, Sunita Chandrasekaran, Barbara Chapman

Department of Computer Science, University of Houston, Houston, USA

Scientific Programming, 2014

BibTeX

Download (PDF)

View

Source

2158

views

Existing and emerging studies show that using single Graphics Processing Units (GPUs) can lead to obtaining significant performance gains. These devices have tremendous processing capabilities. We should be able to achieve further orders of performance speedup if we use more than just one GPU. Heterogeneous processors consisting of multiple CPUs and GPUs offer immense potential and is often considered as a leading candidate for porting complex scientific applications. Unfortunately programming heterogeneous systems require much more effort than what is required for single traditional systems or even multicore systems. Directive-based programming approaches are being widely adopted since they are easy to use/port/maintain application code. One such popular model is OpenMP that is a portable directive-based shared memory programming model. Similar to OpenMP is OpenACC that is currently being extensively used to port applications to accelerators. However neither of the models provide support for multiple GPUs. A plausible solution is to use combination of OpenMP and OpenACC that forms a hybrid model, however building this model has its own limitations due to lack of necessary compilers’ support. Moreover the model also lacks support for direct device-to-device communication. This is an important issue to tackle especially while using accelerators, since data transfer between host and device can be very expensive. With these as the motivation factors, in this paper, we have proposed and developed programming strategies for heterogeneous systems. One of the strategies we employ is a hybrid model (OpenMP and OpenACC). We critically analyze its applicability. The limitations of this model led to an alternate strategy where we extend OpenACC by proposing and developing extensions that follow a task-based implementation for supporting multiple GPUs. We evaluate our strategies using two case studies and demonstrate its effectiveness.

Tags: Computer science, CUDA, Heterogeneous systems, nVidia, OpenACC, Pthreads, Tesla K20

February 2, 2015 by hgpu

Rating: 0.5/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org