30304

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Aksel Alpay, Vincent Heuveline
Engineering Mathematics and Computing Lab, Interdisciplinary Center for Scientific Computing, Heidelberg University, Heidelberg, Germany
Proceedings of the 13th International Workshop on OpenCL and SYCL (IWOCL ’25), 2025

@inproceedings{alpay2025adaptivity,

   title={Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation},

   author={Alpay, Aksel and Heuveline, Vincent},

   booktitle={Proceedings of the 13th International Workshop on OpenCL and SYCL},

   pages={1–12},

   year={2025}

}

Specializing kernels by including runtime information during just-in-time (JIT) -compilation can improve performance at the expense of potentially generating more kernels. In this work, we contribute the runtime adaptivity framework that we have implemented in AdaptiveCpp. This framework can automatically generate specialized kernels at JIT-time, automatically taking into account various information about the kernel invocation, such as work group sizes, data alignments of pointer kernel arguments, or the kernel argument values themselves. While similar approaches have already been investigated for other programming models, to our knowledge, AdaptiveCpp is the first SYCL implementation that can automatically leverage such information for the purpose of generating highly optimized kernels. Our solution is available and enabled by default in the AdaptiveCpp SYCL implementation and supports CPUs, Intel GPUs, NVIDIA GPUs and AMD GPUs. Using a set of of mini-apps and benchmarks on NVIDIA, AMD and Intel hardware, we find that AdaptiveCpp with our new framework outperforms CUDA by 30% in the geometric mean, and HIP and oneAPI by 44% and 23%, respectively. We find that our framework is highly effective, achieving performance gains for SYCL code in excess of 5x in the most extreme cases. We also discuss the impact of each individual implemented optimization technique, and find that for the tested NVIDIA hardware, the combination of all techniques is important. On AMD and Intel, specializing work group size and kernel argument values was most important. Furthermore, we show how a combination of a persistent on-disk JIT-cache, careful design and choice of optimization techniques, as well as categorization of optimization techniques can mitigate overheads due to the additional JIT compilations to the point where they are generally no longer of concern for most applications.
No votes yet.
Please wait...

You must be logged in to post a comment.

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: