high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Static Compilation Analysis for Host-Accelerator Communication Optimization

Static Compilation Analysis for Host-Accelerator Communication Optimization

Mehdi Amini, Fabien Coelho, Francois Irigoin, Ronan Keryell

HPC Project, Meudon, France

24th Int. Workshop on Languages and Compilers for Parallel Computing (LCPC), 2011

@inproceedings{Amini2011c,

author={Amini, Mehdi and Coelho, Fabien and Irigoin, Francois and Keryell, Ronan},

title={Static Compilation Analysis for Host-Accelerator Communication Optimization},

booktitle={24th Int. Workshop on Languages and Compilers for Parallel Computing (LCPC)},

year={2011},

address={Fort Collins, Colorado, USA},

month={sep},

note={Also Technical Report MINES ParisTech A/476/CRI}

}

Download (PDF)

View

Source

Source codes

Package:

Par4All

1525

views

We present an automatic, static program transformation that schedules and generates efficient memory transfers between a computer host and its hardware accelerator, addressing a well-known performance bottleneck. Our automatic approach uses two simple heuristics: to perform transfers to the accelerator as early as possible and to delay transfers back from the accelerator as late as possible. We implemented this transformation as a middle-end compilation pass in the pips/Par4All compiler. In the generated code, redundant communications due to data reuse between kernel executions are avoided. Instructions that initiate transfers are scheduled effectively at compile-time. We present experimental results obtained with the Polybench 2.0, some Rodinia benchmarks, and with a real numerical simulation. We obtain an average speedup of 4 to 5 when compared to a naive parallelization using a modern gpu with Par4All, hmpp, and pgi, and 3.5 when compared to an OpenMP version using a 12-core multiprocessor.

Tags: Benchmarking, Computer science, CUDA, Numerical simulation, nVidia, OpenMP, Optimization, Package, Tesla C2050

October 11, 2011 by hgpu

Rating: 2.5/5. From 1 vote.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Static Compilation Analysis for Host-Accelerator Communication Optimization

Package:

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

Static Compilation Analysis for Host-Accelerator Communication Optimization

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)