OpenMP offload at the Exascale using Intel GPU Max 1550: evaluation of STREAmS compressible solver

hgpu.org » Applications » Fluid dynamics » OpenMP offload at the Exascale using Intel GPU Max 1550: evaluation of STREAmS compressible solver

OpenMP offload at the Exascale using Intel GPU Max 1550: evaluation of STREAmS compressible solver

Francesco Salvadore, Giacomo Rossi, Srikanth Sathyanarayana, Matteo Bernardini

HPC Department, CINECA, via dei Tizii 6/B, Rome, 00185, Italy

The Journal of Supercomputing, 2024

@article{salvadore2024openmp,

title={OpenMP offload at the Exascale using Intel{textregistered} GPU Max 1550: evaluation of STREAmS compressible solver},

author={Salvadore, Francesco and Rossi, Giacomo and Sathyanarayana, Srikanth and Bernardini, Matteo},

year={2024}

}

Download (PDF)

View

Source

2014

views

Nearly 20 years after the birth of general purpose GPU computing, the HPC landscape is now dominated by GPUs. After years of undisputed dominance by NVIDIA, new players have entered the arena in a convincing manner, namely AMD and more recently Intel, whose devices currently power the first two clusters in the Top500 ranking. Unfortunately, code porting is still a major problem, even more so with the presence of different vendors, but at the same time the emergence of simplified standard paradigms suggests an encouraging prospect for developers. In this work, we analyze the porting and performance of STREAmS, a community code for compressible fluid dynamics, on Intel® Data Center GPU Max 1550 (formerly called Ponte Vecchio or PVC) based architectures. First, we discuss the porting, based on the offload functionality of the OpenMP 5.x paradigm, and in particular using a hybrid directives/APIs approach that fits smoothly into the multi-backend software ecosystem of STREAmS-2. Second, we analyze the performance of the code on two benchmark clusters powered by PVC, including the exascale Aurora cluster. The performance is evaluated at the different levels 1 of parallelism involved, i.e., the intrinsic parallelism of the PVC tile, the inter-tile parallelism within the GPU configuration, between the GPUs within the node, and between the nodes within the cluster. The analysis shows that although the implementation complexity of the OpenMP porting is limited, it is necessary to follow some important guidelines to achieve satisfactory performance. The PVC GPU shows about 40% higher performance than the NVIDIA A100 or AMD MI250X GPUs, which however were released about 3 years earlier. Both intra-node and inter-node scalability show good results. Overall, the introduction of PVC into the GPU computing HPC landscape represents a positive step forward for diversification and competitiveness in the sector.

Tags: AMD Radeon Instinct MI250X, ATI, Benchmarking, cfd, Compression, Fluid dynamics, Intel, Intel Data Center GPU Max 1550, nVidia, nVidia A100, OpenMP

April 14, 2024 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org