high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Exploring Fine-Grained Task-based Execution on Multi-GPU Systems

Exploring Fine-Grained Task-based Execution on Multi-GPU Systems

Long Chen, Oreste Villa, Guang R. Gao

Qualcomm Incorporated, San Diego, CA 92121

IEEE International Conference on Cluster Computing (CLUSTER), 2011

DOI:10.1109/CLUSTER.2011.50

@inproceedings{chen2011exploring,

title={Exploring Fine-Grained Task-based Execution on Multi-GPU Systems},

author={Chen, L. and Villa, O. and Gao, G.R.},

booktitle={Cluster Computing (CLUSTER), 2011 IEEE International Conference on},

pages={386–394},

organization={IEEE},

year={2011}

}

Download (PDF)

View

Source

2064

views

Using multi-GPU systems, including GPU clusters, is gaining popularity in scientific computing. However, when using multiple GPUs concurrently, the conventional data parallel GPU programming paradigms, e.g., CUDA, cannot satisfactorily address certain issues, such as load balancing, GPU resource utilization, overlapping fine grained computation with communication, etc. In this paper, we present a fine-grained task-based execution framework for multi-GPU systems. By scheduling finer-grained tasks than what is supported in the conventional CUDA programming method among multiple GPUs, and allowing concurrent task execution on a single GPU, our framework provides means for solving the above issues and efficiently utilizing multi-GPU systems. Experiments with a molecular dynamics application show that, for nonuniform distributed workload, the solutions based on our framework achieve good load balance, and considerable performance improvement over other solutions based on the standard CUDA programming methodologies.

Tags: Computer science, CUDA, GPU cluster, Molecular dynamics, nVidia, Tesla C1060

November 1, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Exploring Fine-Grained Task-based Execution on Multi-GPU Systems

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Exploring Fine-Grained Task-based Execution on Multi-GPU Systems

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)