high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Automatic OpenCL code generation for multi-device heterogeneous architectures

Automatic OpenCL code generation for multi-device heterogeneous architectures

Pei Li, Elisabeth Brunet, Francois Trahay, Christian Parrot, Gael Thomas, Raymond Namyst

Telecom SudParis

International Conference on Parallel Processing (ICPP ’15), 2015

@article{li2015automatic,

title={Automatic OpenCL code generation for multi-device heterogeneous architectures},

author={Li, Pei and Brunet, Elisabeth and Trahay, Fran{c{c}}ois and Parrot, Christian and Thomas, Ga{"e}l and Namyst, Raymond and SudParis, Telecom},

year={2015}

}

Download (PDF)

View

Source

1995

views

Using multiple accelerators, such as GPUs or Xeon Phis, is attractive to improve the performance of large data parallel applications and to increase the size of their workloads. However, writing an application for multiple accelerators remains today challenging because going from a single accelerator to multiple ones indeed requires to deal with potentially nonuniform domain decomposition, inter-accelerator data movements, and dynamic load balancing. Writing such code manually is time consuming and error-prone. In this paper, we propose a new programming tool called STEPOCL along with a new domain specific language designed to simplify the development of an application for multiple accelerators. We evaluate both the performance and the usefulness of STEPOCL with three applications and show that: (i) the performance of an application written with STEPOCL scales linearly with the number of accelerators, (ii) the performance of an application written using STEPOCL competes with a handwritten version, (iii) larger workloads run on multiple devices that do not fit in the memory of a single device, (iv) thanks to STEPOCL, the number of lines of code required to write an application for multiple accelerators is roughly divided by ten.

Tags: Code generation, Computer science, Heterogeneous systems, Intel Xeon Phi, nVidia, nVidia Quadro FX 5800, OpenCL

September 19, 2015 by hgpu

Rating: 5.0/5. From 1 vote.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Automatic OpenCL code generation for multi-device heterogeneous architectures

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

Automatic OpenCL code generation for multi-device heterogeneous architectures

Share this:

Recent source codes

Most viewed papers (last 30 days)