high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Performance portability evaluation of blocked stencil computations on GPUs

Performance portability evaluation of blocked stencil computations on GPUs

Oscar Antepara, Hans Johansen, Samuel Williams, Tuowen Zhao, Samantha Hirsch, Priya Goyal, Mary Hall

Lawrence Berkeley National Lab, Berkeley, California, USA

International Workshop on Performance, Portability & Productivity in HPC (P3HPC), 2023

BibTeX

Download (PDF)

View

Source

Source codes

Package:

Brick Layout for C++: Distributed Performance-portable Stencil Computation

1201

views

In this new era where multiple GPU vendors are leading the supercomputing landscape, and multiple programming models are available to users, the drive to achieve performance portability across platforms faces new challenges. Consider stencil algorithms, where architecture-specific solutions are required to optimize for the parallelism hierarchy and memory hierarchy of emerging systems. In this work, we analyze performance portability of the BrickLib domain-specific library and vector code generator for stencils. BrickLib employs fine-grain data blocking to reduce the large amount of data movement associated with stencils. We compare different GPUs (NVIDIA, AMD and Intel) and their associated programming models (CUDA, HIP and SYCL). By testing a wide range of stencil configurations, we show that overall, BrickLib achieves good performance independent of machine or programming model. Moreover, we introduce correlation models as a new tool for comparing architectures and programming models from Roofline model data.

Tags: AMD Radeon Instinct MI250X, ATI, Code generation, Computer science, CUDA, HIP, nVidia, nVidia A100, Package, performance portability, Stencil computation, SYCL

October 29, 2023 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

Performance portability evaluation of blocked stencil computations on GPUs

Package:

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Performance portability evaluation of blocked stencil computations on GPUs

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)