high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Stencil Computations on AMD and Nvidia Graphics Processors: Performance and Tuning Strategies

Stencil Computations on AMD and Nvidia Graphics Processors: Performance and Tuning Strategies

Johannes Pekkilä, Oskar Lappi, Fredrik Robertsén, Maarit J. Korpi-Lagg

Department of Computer Science, Aalto University, Espoo, 02150, Finland

arXiv:2406.08923 [cs.DC], (13 Jun 2024)

DOI:10.48550/arXiv.2406.08923

BibTeX

Download (PDF)

View

Source

Source codes

Package:

Astaroth: A Scalable Multi-GPU Library for Stencil Computations

1247

views

Over the last ten years, graphics processors have become the de facto accelerator for data-parallel tasks in various branches of high-performance computing, including machine learning and computational sciences. However, with the recent introduction of AMD-manufactured graphics processors to the world’s fastest supercomputers, tuning strategies established for previous hardware generations must be re-evaluated. In this study, we evaluate the performance and energy efficiency of stencil computations on modern datacenter graphics processors, and propose a tuning strategy for fusing cache-heavy stencil kernels. The studied cases comprise both synthetic and practical applications, which involve the evaluation of linear and nonlinear stencil functions in one to three dimensions. Our experiments reveal that AMD and Nvidia graphics processors exhibit key differences in both hardware and software, necessitating platform-specific tuning to reach their full computational potential.

Tags: AMD Radeon Instinct MI100, AMD Radeon Instinct MI250X, ATI, Computer science, CUDA, Energy-efficient computing, HIP, nVidia, nVidia A100, nVidia V100, Package, Performance, PyTorch, Stencil computation

June 16, 2024 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Stencil Computations on AMD and Nvidia Graphics Processors: Performance and Tuning Strategies

Package:

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

Stencil Computations on AMD and Nvidia Graphics Processors: Performance and Tuning Strategies

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)