Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

hgpu.org » Applications » Computer science » Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Aksel Alpay, Vincent Heuveline

Engineering Mathematics and Computing Lab, Interdisciplinary Center for Scientific Computing, Heidelberg University, Heidelberg, Germany

Proceedings of the 13th International Workshop on OpenCL and SYCL (IWOCL ’25), 2025

DOI:10.1145/3731125.3731127

@inproceedings{alpay2025adaptivity,

title={Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation},

author={Alpay, Aksel and Heuveline, Vincent},

booktitle={Proceedings of the 13th International Workshop on OpenCL and SYCL},

pages={1–12},

year={2025}

}

Download (PDF)

View

Source

Source codes

Package:

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

886

views

Specializing kernels by including runtime information during just-in-time (JIT) -compilation can improve performance at the expense of potentially generating more kernels. In this work, we contribute the runtime adaptivity framework that we have implemented in AdaptiveCpp. This framework can automatically generate specialized kernels at JIT-time, automatically taking into account various information about the kernel invocation, such as work group sizes, data alignments of pointer kernel arguments, or the kernel argument values themselves. While similar approaches have already been investigated for other programming models, to our knowledge, AdaptiveCpp is the first SYCL implementation that can automatically leverage such information for the purpose of generating highly optimized kernels. Our solution is available and enabled by default in the AdaptiveCpp SYCL implementation and supports CPUs, Intel GPUs, NVIDIA GPUs and AMD GPUs. Using a set of of mini-apps and benchmarks on NVIDIA, AMD and Intel hardware, we find that AdaptiveCpp with our new framework outperforms CUDA by 30% in the geometric mean, and HIP and oneAPI by 44% and 23%, respectively. We find that our framework is highly effective, achieving performance gains for SYCL code in excess of 5x in the most extreme cases. We also discuss the impact of each individual implemented optimization technique, and find that for the tested NVIDIA hardware, the combination of all techniques is important. On AMD and Intel, specializing work group size and kernel argument values was most important. Furthermore, we show how a combination of a persistent on-disk JIT-cache, careful design and choice of optimization techniques, as well as categorization of optimization techniques can mitigate overheads due to the additional JIT compilations to the point where they are generally no longer of concern for most applications.

Tags: AMD Radeon Pro VII, ATI, Benchmarking, Computer science, CUDA, HIP, Intel UHD 630, nVidia, nVidia RTX A5000, Package, Performance, SYCL

October 19, 2025 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

* * *

high performance computing on graphics processing units: hgpu.org