Performance portability via C++ PSTL, SYCL, OpenMP, and HIP: the Gaia AVU-GSR case study

hgpu.org » Applications » Physics » Astrophysics » Performance portability via C++ PSTL, SYCL, OpenMP, and HIP: the Gaia AVU-GSR case study

Performance portability via C++ PSTL, SYCL, OpenMP, and HIP: the Gaia AVU-GSR case study

Giulio Malenza, Valentina Cesare, Marco Edoardo Santimaria, Robert Birke, Alberto Vecchiato, Ugo Becciani, Marco Aldinucci

Department of Computer Science, University of Turin, Italy

Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC24-W), 2024

DOI:10.1109/SCW63240.2024.00157

BibTeX

Download (PDF)

View

Source

Source codes

Package:

Gaia AVU-GSR

936

views

Applications that analyze data from modern scientific experiments will soon require a computing capacity of ExaFLOPs. The current trend to achieve such performance is to employ GPU-accelerated supercomputers and design applications to exploit this hardware optimally. Since each supercomputer is typically a one-off project, the necessity of having computational languages portable across diverse CPU and GPU architectures without performance losses is increasingly compelling. Here, we study the performance portability of the LSQR algorithm as found in the AVU-GSR code of the ESA Gaia mission. This code computes the astrometric parameters of the ~10^8 stars in our Galaxy. The LSQR algorithm is widely used across a broad range of HPC applications, elevating the study’s relevance beyond the astrophysical domain. We developed different GPUaccelerated ports based on CUDA, C++ PSTL, SYCL, OpenMP, and HIP. We carefully verified the correctness of each port and tuned them to five different GPU-accelerated platforms from NVIDIA and AMD to evaluate the performance portability (PP) in terms of the harmonic mean of the application’s performance efficiency across the tested hardware. HIP was demonstrated to be the most portable solution with a 0.94 average P across the tested problem sizes, closely followed by SYCL coupled with AdaptiveCpp (ACPP) with 0.93. If we only consider NVIDIA platforms, CUDA would be the winner with 0.97. The tuning-oblivious C++ PSTL achieves 0.62 when coupled with vendor-specific compilers.

Tags: AMD Radeon Instinct MI250X, Astrophysics, ATI, Computer science, CUDA, HIP, HPC, nVidia, nVidia A100, nVidia H100, nVidia V100, OpenMP, Package, Performance, performance portability, SYCL, Tesla T4

November 24, 2024 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org