high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Automatic CPU-GPU communication management and optimization

Automatic CPU-GPU communication management and optimization

Thomas B. Jablin, Prakash Prabhu, James A. Jablin, Nick P. Johnson, Stephen R. Beard, David I. August

Princeton University, Princeton, NJ

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation, PLDI ’11, 2011

DOI:10.1145/1993498.1993516

BibTeX

Download (PDF)

View

Source

1930

views

The performance benefits of GPU parallelism can be enormous, but unlocking this performance potential is challenging. The applicability and performance of GPU parallelizations is limited by the complexities of CPU-GPU communication. To address these communications problems, this paper presents the first fully automatic system for managing and optimizing CPU-GPU communcation. This system, called the CPU-GPU Communication Manager (CGCM), consists of a run-time library and a set of compiler transformations that work together to manage and optimize CPU-GPU communication without depending on the strength of static compile-time analyses or on programmer-supplied annotations. CGCM eases manual GPU parallelizations and improves the applicability and performance of automatic GPU parallelizations. For 24 programs, CGCM-enabled automatic GPU parallelization yields a whole program geomean speedup of 5.36x over the best sequential CPU-only execution.

Tags: Algorithms, Computer science, CUDA, nVidia, nVidia GeForce GTX 480, Optimization, Performance, Programming techniques

September 7, 2011 by hgpu

Rating: 2.5/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Automatic CPU-GPU communication management and optimization

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

Automatic CPU-GPU communication management and optimization

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)