5862

Static Compilation Analysis for Host-Accelerator Communication Optimization

Mehdi Amini, Fabien Coelho, Francois Irigoin, Ronan Keryell
HPC Project, Meudon, France
24th Int. Workshop on Languages and Compilers for Parallel Computing (LCPC), 2011
BibTeX

Download Download (PDF)   View View   Source Source   Source codes Source codes

Package:

1815

views

We present an automatic, static program transformation that schedules and generates efficient memory transfers between a computer host and its hardware accelerator, addressing a well-known performance bottleneck. Our automatic approach uses two simple heuristics: to perform transfers to the accelerator as early as possible and to delay transfers back from the accelerator as late as possible. We implemented this transformation as a middle-end compilation pass in the pips/Par4All compiler. In the generated code, redundant communications due to data reuse between kernel executions are avoided. Instructions that initiate transfers are scheduled effectively at compile-time. We present experimental results obtained with the Polybench 2.0, some Rodinia benchmarks, and with a real numerical simulation. We obtain an average speedup of 4 to 5 when compared to a naive parallelization using a modern gpu with Par4All, hmpp, and pgi, and 3.5 when compared to an OpenMP version using a 12-core multiprocessor.
Rating: 2.5/5. From 1 vote.
Please wait...

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org