Reproducible Study and Performance Analysis of GPU Programming Paradigms: OpenACC vs. CUDA in Key Linear Algebra Computations
Faculty of Science and Technology FSTM, University of Luxembourg, 2 Av. de l’Universite, Esch-Belval, L-4365, Esch-sur-Alzette, Luxembourg
Research Square, preprint rs.3.rs-5657196/v1, 2024
Scientific and engineering problems are frequently governed by partial differential equations; however, the analytical solutions of these equations are often impractical, thereby forcing the adoption of numerical methods. Basic Linear Algebra Subprograms (BLAS) operations constitute a fundamental component of these numerical approaches, incorporating essential tasks such as Level 1 operations (dot products and vector addition), Level 2 operations (matrix-vector multiplication), and Level 3 operations (matrix-matrix multiplication). Graphics Processing Units (GPUs), particularly those produced by NVIDIA, have gained significant computational power and are extensively employed to tackle a variety of numerical challenges. Nevertheless, substantial obstacles remain in targeting diverse GPU architectures, particularly concerning portability, the reduction of workarounds, and the enhancement of performance. This study utilizes directive-based programming languages, such as OpenACC, to effectively exploit GPU capabilities. We undertake a comprehensive comparative study and performance evaluation of the OpenACC programming model in comparison to CUDA in executing essential BLAS routines.
December 24, 2024 by hgpu
Your response
You must be logged in to post a comment.