Performance portability evaluation of blocked stencil computations on GPUs
Lawrence Berkeley National Lab, Berkeley, California, USA
International Workshop on Performance, Portability & Productivity in HPC (P3HPC), 2023
In this new era where multiple GPU vendors are leading the supercomputing landscape, and multiple programming models are available to users, the drive to achieve performance portability across platforms faces new challenges. Consider stencil algorithms, where architecture-specific solutions are required to optimize for the parallelism hierarchy and memory hierarchy of emerging systems. In this work, we analyze performance portability of the BrickLib domain-specific library and vector code generator for stencils. BrickLib employs fine-grain data blocking to reduce the large amount of data movement associated with stencils. We compare different GPUs (NVIDIA, AMD and Intel) and their associated programming models (CUDA, HIP and SYCL). By testing a wide range of stencil configurations, we show that overall, BrickLib achieves good performance independent of machine or programming model. Moreover, we introduce correlation models as a new tool for comparing architectures and programming models from Roofline model data.
October 29, 2023 by hgpu