5862

Static Compilation Analysis for Host-Accelerator Communication Optimization

Mehdi Amini, Fabien Coelho, Francois Irigoin, Ronan Keryell
HPC Project, Meudon, France
24th Int. Workshop on Languages and Compilers for Parallel Computing (LCPC), 2011

@inproceedings{Amini2011c,

   author={Amini, Mehdi and Coelho, Fabien and Irigoin, Francois and Keryell, Ronan},

   title={Static Compilation Analysis for Host-Accelerator Communication Optimization},

   booktitle={24th Int. Workshop on Languages and Compilers for Parallel Computing (LCPC)},

   year={2011},

   address={Fort Collins, Colorado, USA},

   month={sep},

   note={Also Technical Report MINES ParisTech A/476/CRI}

}

Download Download (PDF)   View View   Source Source   Source codes Source codes

Package:

667

views

We present an automatic, static program transformation that schedules and generates efficient memory transfers between a computer host and its hardware accelerator, addressing a well-known performance bottleneck. Our automatic approach uses two simple heuristics: to perform transfers to the accelerator as early as possible and to delay transfers back from the accelerator as late as possible. We implemented this transformation as a middle-end compilation pass in the pips/Par4All compiler. In the generated code, redundant communications due to data reuse between kernel executions are avoided. Instructions that initiate transfers are scheduled effectively at compile-time. We present experimental results obtained with the Polybench 2.0, some Rodinia benchmarks, and with a real numerical simulation. We obtain an average speedup of 4 to 5 when compared to a naive parallelization using a modern gpu with Par4All, hmpp, and pgi, and 3.5 when compared to an OpenMP version using a 12-core multiprocessor.
Rating: 2.5. From 1 vote.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: