SnuHPL: high performance LINPACK for heterogeneous GPUs
Seoul National University
36th ACM International Conference on Supercomputing (ICS ’22), 2022
@inproceedings{kim2022snuhpl,
title={SnuHPL: high performance LINPACK for heterogeneous GPUs},
author={Kim, Jinpyo and Kwon, Hyungdal and Kang, Jintaek and Park, Jihwan and Lee, Seungwook and Lee, Jaejin},
booktitle={Proceedings of the 36th ACM International Conference on Supercomputing},
pages={1–12},
year={2022}
}
These days, it is typical for a large-scale cluster system to have different kinds of GPUs. However, HPL (High-Performance LINPACK), the de-facto standard LINPACK implementation for evaluating the performance of a cluster system, is originally designed to work only for homogeneous CPU-only systems. In this paper, we develop SnuHPL, an optimized HPL for clusters of modern heterogeneous GPUs. To optimize SnuHPL for the heterogeneous GPUs, we design a performance model, a SnuHPL simulator based on the model, and a greedy heuristic algorithm based on the simulator. The algorithm generates the best data distribution for a given cluster configuration by considering computing power, memory capacity, and network performance altogether. We also present a simple technique to increase the energy efficiency of HPL by adjusting the core clock frequency of the GPUs. The evaluation of the data distribution algorithm on small clusters of different GPU combinations shows that it outperforms well-known other data distribution strategies. We show the effectiveness of SnuHPL on a cluster of 1,760 NVIDIA A100-80GB GPUs and 440 A100-40GB GPUs. We also show the effectiveness of the proposed energy optimization technique on a cluster of 144 A100-80GB GPUs.
June 26, 2022 by hgpu