high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Improved Programming of GPU Architectures through Automated Data Allocation and Loop Restructuring

Improved Programming of GPU Architectures through Automated Data Allocation and Loop Restructuring

Andrea Di Biagio, Giovanni Agosta

Dipartimento di Elettronica e Informazione, Politecnico di Milano, Italy

23rd International Conference on Architecture of Computing Systems (ARCS), 2010

@article{biagio2010improved,

title={Improved Programming of GPU Architectures through Automated Data Allocation and Loop Restructuring},

author={Biagio, A.D. and Agosta, G.},

journal={ARCS 2010},

year={2010},

publisher={VDE VERLAG GmbH}

}

Source

1283

views

The programmability of recent graphic processing unit (GPU) architectures has been the main factor driving the dramatic increase in interest for this class of architectures as low-cost accelerators for a wide range of high-performance applications. Current GPU programming models, such as OpenCL and CUDA, still expose too many architectural features, such as the memory hierarchy, to the programmer. We propose to raise the abstraction level of code by mapping some constructs of the well-known OpenMP parallel programmingmodel onto the dominant CUDA GPU programming model. To this end, we are studying solutions for two main issues: the automated allocation of data on the GPU device memory hierarchy, and the translation of OpenMP parallel loops to CUDA kernels. We report some initial experimental results showing that the transformations are indeed promising.

Tags: Computer science, CUDA, nVidia, OpenMP, Programming techniques

June 21, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Improved Programming of GPU Architectures through Automated Data Allocation and Loop Restructuring

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Improved Programming of GPU Architectures through Automated Data Allocation and Loop Restructuring

Share this:

Recent source codes

Most viewed papers (last 30 days)