Leveraging LLVM OpenMP GPU Offload Optimizations for Kokkos Applications
NERSC, Lawrence Berkeley National Laboratory, Berkeley, USA
IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC), 2024
@article{gayatri2024leveraging,
title={Leveraging LLVM OpenMP GPU Offload Optimizations for Kokkos Applications},
author={Gayatri, Rahulkumar and Tian, Shilei and Olivier, Stephen L and Wright, Eric and Doerfert, Johannes},
year={2024}
}
OpenMP provides a cross-vendor API for GPU offload that can serve as an implementation layer under performance portability frameworks like the Kokkos C++ library. However, recent work identified some impediments to performance with this approach arising from limitations in the API or in the available implementations. Advanced programming concepts such as hierarchical parallelism and use of dynamic shared memory were a particular area of concern. In this paper, we apply recent improvements and extensions in the LLVM/Clang OpenMP compiler and runtime library to the Kokkos backend that targets GPUs via OpenMP offload. We focus on efficient hierarchical parallelism and use of fast GPU scratch memory. We compare the performance of applications written using the Kokkos library with this improved OpenMP backend against the same programs using the CUDA and HIP backends. This evaluation shows progress toward closing the performance gaps between native and OpenMP backends and offers insights that may be useful to users and implementers of other runtime systems and programming frameworks for GPUs.
February 16, 2025 by hgpu