high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Fine-Grained Synchronizations and Dataflow Programming on GPUs

Fine-Grained Synchronizations and Dataflow Programming on GPUs

Ang Li, Gert-Jan van den Braak, Henk Corporaa, Akash Kumar

Eindhoven University of Technology, Eindhoven, Netherlands

International Conference on Supercomputing (ICS), 2015

@article{li2015fine,

title={Fine-Grained Synchronizations and Dataflow Programming on GPUs},

author={Li, Ang and van den Braak, Gert-Jan and Corporaal, Henk and Kumar, Akash},

year={2015}

}

Download (PDF)

View

Source

2830

views

The last decade has witnessed the blooming emergence of many-core platforms, especially the graphic processing units (GPUs). With the exponential growth of cores in GPUs, utilizing them efficiently becomes a challenge. The data-parallel programming model assumes a single instruction stream for multiple concurrent threads (SIMT); therefore little support is offered to enforce thread ordering and fine-grained synchronizations. This becomes an obstacle when migrating algorithms which exploit fine-grained parallelism, to GPUs, such as the data-flow algorithms. In this paper, we propose a novel approach for fine-grained inter-thread synchronizations on the shared memory of modern GPUs. We demonstrate its performance and compare it with other fine-grained and medium-grained synchronization approaches. Our method achieves 1.5x speedup over the warp-barrier based approach and 4.0x speedup over the atomic spin-lock based approach on average. To further explore the possibility of realizing fine-grained data-flow algorithms on GPUs, we apply the proposed synchronization scheme to Needleman-Wunsch – a 2D wavefront application involving massive cross-loop data dependencies. Our implementation achieves 3.56x speedup over the atomic spin-lock implementation and 1.15x speedup over the conventional data-parallel implementation for a basic sub-grid, which implies that the fine-grained, lock-based programming pattern could be an alternative choice for designing general-purpose GPU applications (GPGPU).

Tags: Computer science, CUDA, nVidia, nVidia GeForce GTX 570, Performance, PTX

May 3, 2015 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org

Fine-Grained Synchronizations and Dataflow Programming on GPUs

Your response

Recent source codes

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

VibeCodeHPC - Multi Agentic Vibe Coding for HPC

Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework

exa-AMD: Exascale Accelerated Materials Discovery

TRUST: a thermalhydraulic software package for CFD simulations

Modular: The Modular Platform (includes MAX & Mojo)

Allo: Accelerator Design Language

Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization

Most viewed papers (last 30 days)

Fine-Grained Synchronizations and Dataflow Programming on GPUs

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)