Towards GPU Parallelism Abstractions in Rust: A Case Study with Linear Pipelines

hgpu.org » Applications » Computer science » Towards GPU Parallelism Abstractions in Rust: A Case Study with Linear Pipelines

Towards GPU Parallelism Abstractions in Rust: A Case Study with Linear Pipelines

Leonardo Gibrowski Faé, Dalvan Griebler

School of Technology, Pontifical Catholic University of Rio Grande do Sul, Porto Alegre, Brazil

Simpósio Brasileiro de Linguagens de Programação (SBLP), 29, 2025

DOI:10.5753/sblp.2025.13152

@inproceedings{fae2025towards,

title={Towards GPU Parallelism Abstractions in Rust: A Case Study with Linear Pipelines},

author={Fa{‘e}, Leonardo Gibrowski and Griebler, Dalvan},

booktitle={Simp{‘o}sio Brasileiro de Linguagens de Programa{c{c}}{~a}o (SBLP)},

pages={75–83},

year={2025},

organization={SBC}

}

Download (PDF)

View

Source

865

views

Programming Graphics Processing Units (GPUs) for general-purpose computation remains a daunting task, often requiring specialized knowledge of low-level APIs like CUDA or OpenCL. While Rust has emerged as a modern, safe, and performant systems programming language, its adoption in the GPU computing domain is still nascent. Existing approaches often involve intricate compiler modifications or complex static analysis to adapt CPU-centric Rust code for GPU execution. This paper presents a novel high-level abstraction in Rust, leveraging procedural macros to automatically generate GPU-executable code from constrained Rust functions. Our approach simplifies the code generation process by imposing specific limitations on how these functions can be written, thereby avoiding the need for complex static analysis.We demonstrate the feasibility and effectiveness of our abstraction through a case study involving linear pipeline parallel patterns, a common structure in data-parallel applications. By transforming Rust functions annotated as source, stage, or sink in a pipeline, we enable straightforward execution on the GPU. We evaluate our abstraction’s performance and programmability using two benchmark applications: sobel (image filtering) and latbol (fluid simulation), comparing it against manual OpenCL implementations. Our results indicate that while incurring a small performance overhead in some cases, our approach significantly reduces development effort and, in certain scenarios, achieves comparable or even superior throughput compared to CPU-based parallelism.

Tags: Code generation, Computer science, CUDA, nVidia, nVidia GeForce RTX 3090, OpenCL, Rust, Tesla M40

September 28, 2025 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org