Harmonic CUDA: Asynchronous Programming on GPUs

hgpu.org » Applications » Computer science » Harmonic CUDA: Asynchronous Programming on GPUs

Harmonic CUDA: Asynchronous Programming on GPUs

Jonathan Wapman, Sean Treichler, Serban D. Porumbescu, John D. Owens

University of California, Davis, Davis, California, USA

Proceedings of the 14th International Workshop on Programming Models and Applications for Multicores and ManycoresFebruary (PMAM’23), 2023

DOI:10.1145/3582514.3582517

BibTeX

Download (PDF)

View

Source

1110

views

We introduce Harmonic CUDA, a dataflow programming model for GPUs that allows programmers to describe algorithms as a dependency graph of producers and consumers where data flows continuously through the graph for the duration of the kernel. This makes it easier for programmers to exploit asynchrony, warp specialization, and hardware acceleration. Using Harmonic CUDA, we implement two example applications: Matrix Multiplication and GraphSage. The matrix multiplication kernel demonstrates how a key kernel can break down into more granular building blocks, with results that show a geomean average of 80% of cuBLAS performance, and up to 92% when omitting small matrices, as well as an analysis of how to improve performance in the future. GraphSage shows how asynchrony and warp specialization can provide significant performance improvements by reusing the same building blocks as the matrix multiplication kernel. We show performance improvements of 34% by changing to a warp-specialized version compared to a bulk-synchronous implementation. This paper evaluates the strengths and weaknesses of Harmonic CUDA based on these test cases and suggests future work to improve the programming model.

Tags: Computer science, CUDA, GEMM, Linear Algebra, Matrix multiplication, nVidia, nVidia A100

March 5, 2023 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org