high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Fine-grain Task Aggregation and Coordination on GPUs

Fine-grain Task Aggregation and Coordination on GPUs

Marc S. Orr, Bradford M. Beckmann, Steven K. Reinhardt, David A. Wood

University of Wisconsin-Madison

41st International Symposium on Computer Architecture (ISCA-2014), 2014

@article{orr2014fine,

title={Fine-grain Task Aggregation and Coordination on GPUs},

author={Orr, Marc S and Beckmann, Bradford M and Reinhardt, Steven K and Wood, David A},

year={2014}

}

Download (PDF)

View

Source

2337

views

In general-purpose graphics processing unit (GPGPU) computing, data is processed by concurrent threads executing the same function. This model, dubbed single-instruction/multiple-thread (SIMT), requires programmers to coordinate the synchronous execution of similar operations across thousands of data elements. To alleviate this programmer burden, Gaster and Howes outlined the channel abstraction, which facilitates dynamically aggregating asynchronously produced fine-grain work into coarser-grain tasks. However, no practical implementation has been proposed. To this end, we propose and evaluate the first channel implementation. To demonstrate the utility of channels, we present a case study that maps the fine-grain, recursive task spawning in the Cilk programming language to channels by representing it as a flow graph. To support data-parallel recursion in bounded memory, we propose a hardware mechanism that allows wavefronts to yield their execution resources. Through channels and wavefront yield, we implement four Cilk benchmarks. We show that Cilk can scale with the GPU architecture, achieving speedups of as much as 4.3x on eight compute units.

Tags: Computer science, GPGPU-sim, Hardware Architecture, OpenCL, Performance

May 10, 2014 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Fine-grain Task Aggregation and Coordination on GPUs

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Fine-grain Task Aggregation and Coordination on GPUs

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)