Exploring SYCL for batched kernels with memory allocations
Univ. Paris-Saclay, UVSQ, CNRS, CEA, Maison de la Simulation, 91191, Gif-sur-Yvette, France
hal-05015978, (23 May 2025)
Batched kernels with memory allocations is a common pattern in HPC, appearing in multi-dimensional FFTs, neural networks processing, or split computation of numerical operators. Its efficient support is especially complex on GPU where memory per work-item is limited and dynamic memory allocations are challenging. This study investigates whether the native abstractions of SYCL can support performance portability for this pattern. We implement versions of a batched semi-Lagrangian advection kernel using each parallel construct of SYCL. We evaluate them in terms of maintainability, performance portability and memory footprint on CPUs and GPUs (AMD, Intel, NVIDIA), with two distinct SYCL implementations (AdaptiveCpp and DPC++). Our results demonstrate that no single parallel construct of SYCL emerges as best solution and that a construct offering a higher level of abstraction would be required to support this common pattern.
May 25, 2025 by hgpu