Towards GPU Parallelism Abstractions in Rust: A Case Study with Linear Pipelines
School of Technology, Pontifical Catholic University of Rio Grande do Sul, Porto Alegre, Brazil
Simpósio Brasileiro de Linguagens de Programação (SBLP), 29, 2025
@inproceedings{fae2025towards,
title={Towards GPU Parallelism Abstractions in Rust: A Case Study with Linear Pipelines},
author={Fa{‘e}, Leonardo Gibrowski and Griebler, Dalvan},
booktitle={Simp{‘o}sio Brasileiro de Linguagens de Programa{c{c}}{~a}o (SBLP)},
pages={75–83},
year={2025},
organization={SBC}
}
Programming Graphics Processing Units (GPUs) for general-purpose computation remains a daunting task, often requiring specialized knowledge of low-level APIs like CUDA or OpenCL. While Rust has emerged as a modern, safe, and performant systems programming language, its adoption in the GPU computing domain is still nascent. Existing approaches often involve intricate compiler modifications or complex static analysis to adapt CPU-centric Rust code for GPU execution. This paper presents a novel high-level abstraction in Rust, leveraging procedural macros to automatically generate GPU-executable code from constrained Rust functions. Our approach simplifies the code generation process by imposing specific limitations on how these functions can be written, thereby avoiding the need for complex static analysis.We demonstrate the feasibility and effectiveness of our abstraction through a case study involving linear pipeline parallel patterns, a common structure in data-parallel applications. By transforming Rust functions annotated as source, stage, or sink in a pipeline, we enable straightforward execution on the GPU. We evaluate our abstraction’s performance and programmability using two benchmark applications: sobel (image filtering) and latbol (fluid simulation), comparing it against manual OpenCL implementations. Our results indicate that while incurring a small performance overhead in some cases, our approach significantly reduces development effort and, in certain scenarios, achieves comparable or even superior throughput compared to CPU-based parallelism.
September 28, 2025 by hgpu
Your response
You must be logged in to post a comment.