high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Static Compilation Analysis for Host-Accelerator Communication Optimization

Static Compilation Analysis for Host-Accelerator Communication Optimization

Mehdi Amini, Fabien Coelho, Francois Irigoin, Ronan Keryell

HPC Project, Meudon, France

24th Int. Workshop on Languages and Compilers for Parallel Computing (LCPC), 2011

BibTeX

Download (PDF)

View

Source

Source codes

Package:

Par4All

1951

views

We present an automatic, static program transformation that schedules and generates efficient memory transfers between a computer host and its hardware accelerator, addressing a well-known performance bottleneck. Our automatic approach uses two simple heuristics: to perform transfers to the accelerator as early as possible and to delay transfers back from the accelerator as late as possible. We implemented this transformation as a middle-end compilation pass in the pips/Par4All compiler. In the generated code, redundant communications due to data reuse between kernel executions are avoided. Instructions that initiate transfers are scheduled effectively at compile-time. We present experimental results obtained with the Polybench 2.0, some Rodinia benchmarks, and with a real numerical simulation. We obtain an average speedup of 4 to 5 when compared to a naive parallelization using a modern gpu with Par4All, hmpp, and pgi, and 3.5 when compared to an OpenMP version using a 12-core multiprocessor.

Tags: Benchmarking, Computer science, CUDA, Numerical simulation, nVidia, OpenMP, Optimization, Package, Tesla C2050

October 11, 2011 by hgpu

Rating: 2.5/5. From 1 vote.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Static Compilation Analysis for Host-Accelerator Communication Optimization

Package:

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Most viewed papers (last 30 days)

Static Compilation Analysis for Host-Accelerator Communication Optimization

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)