high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Leveraging LLVM OpenMP GPU Offload Optimizations for Kokkos Applications

Leveraging LLVM OpenMP GPU Offload Optimizations for Kokkos Applications

Rahulkumar Gayatri, Shilei Tian, Stephen Olivier, Johannes Doerfert, Eric Wright

NERSC, Lawrence Berkeley National Laboratory, Berkeley, USA

IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC), 2024

BibTeX

Download (PDF)

View

Source

Source codes

Package:

Kokkos C++ Performance Portability Programming EcoSystem

1091

views

OpenMP provides a cross-vendor API for GPU offload that can serve as an implementation layer under performance portability frameworks like the Kokkos C++ library. However, recent work identified some impediments to performance with this approach arising from limitations in the API or in the available implementations. Advanced programming concepts such as hierarchical parallelism and use of dynamic shared memory were a particular area of concern. In this paper, we apply recent improvements and extensions in the LLVM/Clang OpenMP compiler and runtime library to the Kokkos backend that targets GPUs via OpenMP offload. We focus on efficient hierarchical parallelism and use of fast GPU scratch memory. We compare the performance of applications written using the Kokkos library with this improved OpenMP backend against the same programs using the CUDA and HIP backends. This evaluation shows progress toward closing the performance gaps between native and OpenMP backends and offers insights that may be useful to users and implementers of other runtime systems and programming frameworks for GPUs.

Tags: AMD Radeon Instinct MI250X, ATI, Computer science, CUDA, HIP, MPI, nVidia, nVidia A100, OpenMP, Package, performance portability

February 16, 2025 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Leveraging LLVM OpenMP GPU Offload Optimizations for Kokkos Applications

Package:

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

Leveraging LLVM OpenMP GPU Offload Optimizations for Kokkos Applications

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)