29555

Performance portability via C++ PSTL, SYCL, OpenMP, and HIP: the Gaia AVU-GSR case study

Giulio Malenza, Valentina Cesare, Marco Edoardo Santimaria, Robert Birke, Alberto Vecchiato, Ugo Becciani, Marco Aldinucci
Department of Computer Science, University of Turin, Italy
Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC24-W), 2024

@article{malenzaperformance,

   title={Performance portability via C++ PSTL, SYCL, OpenMP, and HIP: the Gaia AVU-GSR case study},

   author={Malenza, Giulio and Cesare, Valentina and Santimaria, Marco Edoardo and Birke, Robert and Vecchiato, Alberto and Becciani, Ugo and Aldinucci, Marco}

}

Download Download (PDF)   View View   Source Source   Source codes Source codes

Package:

506

views

Applications that analyze data from modern scientific experiments will soon require a computing capacity of ExaFLOPs. The current trend to achieve such performance is to employ GPU-accelerated supercomputers and design applications to exploit this hardware optimally. Since each supercomputer is typically a one-off project, the necessity of having computational languages portable across diverse CPU and GPU architectures without performance losses is increasingly compelling. Here, we study the performance portability of the LSQR algorithm as found in the AVU-GSR code of the ESA Gaia mission. This code computes the astrometric parameters of the ~10^8 stars in our Galaxy. The LSQR algorithm is widely used across a broad range of HPC applications, elevating the study’s relevance beyond the astrophysical domain. We developed different GPUaccelerated ports based on CUDA, C++ PSTL, SYCL, OpenMP, and HIP. We carefully verified the correctness of each port and tuned them to five different GPU-accelerated platforms from NVIDIA and AMD to evaluate the performance portability (PP) in terms of the harmonic mean of the application’s performance efficiency across the tested hardware. HIP was demonstrated to be the most portable solution with a 0.94 average P across the tested problem sizes, closely followed by SYCL coupled with AdaptiveCpp (ACPP) with 0.93. If we only consider NVIDIA platforms, CUDA would be the winner with 0.97. The tuning-oblivious C++ PSTL achieves 0.62 when coupled with vendor-specific compilers.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: