high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » SnuHPL: high performance LINPACK for heterogeneous GPUs

SnuHPL: high performance LINPACK for heterogeneous GPUs

Jinpyo Kim, Hyungdal Kwon, Jintaek Kang, Jihwan Park, Seungwook Lee, Jaejin Lee

Seoul National University

36th ACM International Conference on Supercomputing (ICS ’22), 2022

DOI:10.1145/3524059.3532370

BibTeX

Source

1299

views

These days, it is typical for a large-scale cluster system to have different kinds of GPUs. However, HPL (High-Performance LINPACK), the de-facto standard LINPACK implementation for evaluating the performance of a cluster system, is originally designed to work only for homogeneous CPU-only systems. In this paper, we develop SnuHPL, an optimized HPL for clusters of modern heterogeneous GPUs. To optimize SnuHPL for the heterogeneous GPUs, we design a performance model, a SnuHPL simulator based on the model, and a greedy heuristic algorithm based on the simulator. The algorithm generates the best data distribution for a given cluster configuration by considering computing power, memory capacity, and network performance altogether. We also present a simple technique to increase the energy efficiency of HPL by adjusting the core clock frequency of the GPUs. The evaluation of the data distribution algorithm on small clusters of different GPU combinations shows that it outperforms well-known other data distribution strategies. We show the effectiveness of SnuHPL on a cluster of 1,760 NVIDIA A100-80GB GPUs and 440 A100-40GB GPUs. We also show the effectiveness of the proposed energy optimization technique on a cluster of 144 A100-80GB GPUs.

Tags: Benchmarking, Computer science, nVidia, nVidia A100

June 26, 2022 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

SnuHPL: high performance LINPACK for heterogeneous GPUs

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

SnuHPL: high performance LINPACK for heterogeneous GPUs

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)