high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Dynamic load balancing on single- and multi-GPU systems

Dynamic load balancing on single- and multi-GPU systems

Long Chen, Oreste Villa, Sriram Krishnamoorthy, Guang R. Gao

Department of Electrical & Computer Engineering, University of Delaware, Newark, DE 19716

IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010

DOI:10.1109/IPDPS.2010.5470413

BibTeX

Download (PDF)

View

Source

2217

views

The computational power provided by many-core graphics processing units (GPUs) has been exploited in many applications. The programming techniques currently employed on these GPUs are not sufficient to address problems exhibiting irregular, and unbalanced workload. The problem is exacerbated when trying to effectively exploit multiple GPUs concurrently, which are commonly available in many modern systems. In this paper, we propose a task-based dynamic load-balancing solution for single-and multi-GPU systems. The solution allows load balancing at a finer granularity than what is supported in current GPU programming APIs, such as NVIDIA’s CUDA. We evaluate our approach using both micro-benchmarks and a molecular dynamics application that exhibits significant load imbalance. Experimental results with a single-GPU configuration show that our fine-grained task solution can utilize the hardware more efficiently than the CUDA scheduler for unbalanced workload. On multi-GPU systems, our solution achieves near-linear speedup, load balance, and significant performance improvement over techniques based on standard CUDA APIs.

Tags: Computer science, CUDA, nVidia, Optimization, Performance, Task scheduling, Tesla C1060

March 6, 2011 by hgpu

No votes yet.

Please wait...

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Dynamic load balancing on single- and multi-GPU systems

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)

Dynamic load balancing on single- and multi-GPU systems

Share this:

Recent source codes

Most viewed papers (last 30 days)