high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Experiences with High-Level Programming Directives for Porting Applications to GPUs

Experiences with High-Level Programming Directives for Porting Applications to GPUs

Oscar Hernandez, Wei Ding, Barbara Chapman, Ramanan Sankaran, and Richard Graham

Computer Science and Mathematics Division, National Center for Oak Ridge National Laboratory

Facing the Multicore – Challenge II, Lecture Notes in Computer Science, Volume 7174/2012, 96-107, 2012

DOI:10.1007/978-3-642-30397-5_9

@article{hernandez2012experiences,

title={Experiences with High-Level Programming Directives for Porting Applications to GPUs},

author={Hernandez, O. and Ding, W. and Chapman, B. and Kartsaklis, C. and Sankaran, R. and Graham, R.},

journal={Facing the Multicore-Challenge II},

pages={96–107},

year={2012},

publisher={Springer}

}

Download (PDF)

View

Source

1518

views

HPC systems now exploit GPUs within their compute nodes to accelerate program performance. As a result, high-end application development has become extremely complex at the node level. In addition to restructuring the node code to exploit the cores and specialized devices, the programmer may need to choose a programming model such as OpenMP or CPU threads in conjunction with an accelerator programming model to share and manage the different node resources. This comes at a time when programmer productivity and the ability to produce portable code has been recognized as a major concern. In order to offset the high development cost of creating CUDA or OpenCL kernels, directives have been proposed for programming accelerator devices, but their implications are not well known. In this paper, we evaluate the state of the art accelerator directives to program several applications kernels, explore transformations to achieve good performance, and examine the expressivity and performance penalty of using high-level directives versus CUDA. We also compare our results to OpenMP implementations to understand the benefits of running the kernels in the accelerator versus CPU cores.

Tags: Computer science, CUDA, nVidia, OpenCL, Tesla C2070

June 13, 2012 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Experiences with High-Level Programming Directives for Porting Applications to GPUs

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Experiences with High-Level Programming Directives for Porting Applications to GPUs

Share this:

Recent source codes

Most viewed papers (last 30 days)