Reproducible Study and Performance Analysis of GPU Programming Paradigms: OpenACC vs. CUDA in Key Linear Algebra Computations
Faculty of Science and Technology FSTM, University of Luxembourg, 2 Av. de l’Universite, Esch-Belval, L-4365, Esch-sur-Alzette, Luxembourg
Research Square, preprint rs.3.rs-5657196/v1, 2024
@article{krishnasamy2024reproducible,
title={A Reproducible Study and Performance Analysis of GPU Programming Paradigms: OpenACC vs. CUDA in Key Linear Algebra Computations},
author={Krishnasamy, Ezhilmathi and Bouvry, Pascal},
year={2024}
}
Scientific and engineering problems are frequently governed by partial differential equations; however, the analytical solutions of these equations are often impractical, thereby forcing the adoption of numerical methods. Basic Linear Algebra Subprograms (BLAS) operations constitute a fundamental component of these numerical approaches, incorporating essential tasks such as Level 1 operations (dot products and vector addition), Level 2 operations (matrix-vector multiplication), and Level 3 operations (matrix-matrix multiplication). Graphics Processing Units (GPUs), particularly those produced by NVIDIA, have gained significant computational power and are extensively employed to tackle a variety of numerical challenges. Nevertheless, substantial obstacles remain in targeting diverse GPU architectures, particularly concerning portability, the reduction of workarounds, and the enhancement of performance. This study utilizes directive-based programming languages, such as OpenACC, to effectively exploit GPU capabilities. We undertake a comprehensive comparative study and performance evaluation of the OpenACC programming model in comparison to CUDA in executing essential BLAS routines.
December 24, 2024 by hgpu