29911

Exploring SYCL for batched kernels with memory allocations

Aymeric Millan, Thomas Padioleau, Julien Bigot
Univ. Paris-Saclay, UVSQ, CNRS, CEA, Maison de la Simulation, 91191, Gif-sur-Yvette, France
hal-05015978, (23 May 2025)
BibTeX

Download Download (PDF)   View View   Source Source   Source codes Source codes

Package:

740

views

Batched kernels with memory allocations is a common pattern in HPC, appearing in multi-dimensional FFTs, neural networks processing, or split computation of numerical operators. Its efficient support is especially complex on GPU where memory per work-item is limited and dynamic memory allocations are challenging. This study investigates whether the native abstractions of SYCL can support performance portability for this pattern. We implement versions of a batched semi-Lagrangian advection kernel using each parallel construct of SYCL. We evaluate them in terms of maintainability, performance portability and memory footprint on CPUs and GPUs (AMD, Intel, NVIDIA), with two distinct SYCL implementations (AdaptiveCpp and DPC++). Our results demonstrate that no single parallel construct of SYCL emerges as best solution and that a construct offering a higher level of abstraction would be required to support this common pattern.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org