high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Optimizing High-Performance Linpack for Exascale Accelerated Architectures

Optimizing High-Performance Linpack for Exascale Accelerated Architectures

Noel Chalmers, Jakub Kurzak, Damon McDougall, Paul T. Bauman

Advanced Micro Devices Inc.

arXiv:2304.10397 [cs.DC], (20 Apr 2023)

DOI:10.48550/arXiv.2304.10397

@misc{chalmers2023optimizing,

title={Optimizing High-Performance Linpack for Exascale Accelerated Architectures},

author={Noel Chalmers and Jakub Kurzak and Damon McDougall and Paul T. Bauman},

year={2023},

eprint={2304.10397},

archivePrefix={arXiv},

primaryClass={cs.DC}

}

Download (PDF)

View

Source

Source codes

Package:

rocHPL: High Performance Linpack for Next-Generation AMD HPC Accelerators

1379

views

We detail the performance optimizations made in rocHPL, AMD’s open-source implementation of the High-Performance Linpack (HPL) benchmark targeting accelerated node architectures designed for exascale systems such as the Frontier supercomputer. The implementation leverages the high-throughput GPU accelerators on the node via highly optimized linear algebra libraries, as well as the entire CPU socket to perform latency-sensitive factorization phases. We detail novel performance improvements such as a multi-threaded approach to computing the panel factorization phase on the CPU, time-sharing of CPU cores between processes on the node, as well as several optimizations which hide MPI communication. We present some performance results of this implementation of the HPL benchmark on a single node of the Frontier early access cluster at Oak Ridge National Laboratory, as well as scaling to multiple nodes.

Tags: AMD Radeon Instinct MI250X, ATI, Benchmarking, Computer science, Factorization, HIP, Linear Algebra, MPI, Package, Performance

April 23, 2023 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Optimizing High-Performance Linpack for Exascale Accelerated Architectures

Package:

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Optimizing High-Performance Linpack for Exascale Accelerated Architectures

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)