high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Energy-efficient Computing on Distributed GPUs using Dynamic Parallelism and GPU-controlled Communication

Energy-efficient Computing on Distributed GPUs using Dynamic Parallelism and GPU-controlled Communication

Lena Oden, Benjamin Klenk, Holger Froning

Fraunhofer Institute for Industrial Mathematics, Competence Center High Perfomance Computing, Kaiserslautern, Germany

Second Workshop on Energy-efficient Super-Computing (E2SC), 2014

BibTeX

Download (PDF)

View

Source

2670

views

GPUs are widely used in high performance computing, due to their high computational power and high performance per Watt. Still, one of the main bottlenecks of GPU-accelerated cluster computing is the data transfer between distributed GPUs. This not only affects performance, but also power consumption. The most common way to utilize a GPU cluster is a hybrid model, in which the GPU is used to accelerate the computation while the CPU is responsible for the communication. This approach always requires an dedicated CPU thread, which consumes additional CPU cycles and therefore increases the power consumption of the complete application. In recent work we have shown that the GPU is able to control the communication independently of the CPU. Still, there are several problems with GPU-controlled communication. The main problem is intra-GPU synchronization, since GPU blocks are non-preemptive. Therefore, the use of communication requests within a GPU can easily result in a deadlock. In this work we show how Dynamic Parallelism solves this problem. GPU-controlled communication in combination with Dynamic Parallelism allows keeping the control flow of multi-GPU applications on the GPU and bypassing the CPU completely. Although the performance of applications using GPU-controlled communication is still slightly worse than the performance of hybrid applications, we will show that performance per Watt increases by up to 10% while still using commodity hardware.

Tags: Cluster computing, Computer science, CUDA, Distributed computing, Energy-efficient computing, GPU cluster, nVidia, Tesla K20

March 25, 2015 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Energy-efficient Computing on Distributed GPUs using Dynamic Parallelism and GPU-controlled Communication

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

Energy-efficient Computing on Distributed GPUs using Dynamic Parallelism and GPU-controlled Communication

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)