FTTN: Feature-Targeted Testing for Numerical Properties of NVIDIA & AMD Matrix Accelerators
Kahlert School of Computing, University of Utah, USA
arXiv:2403.00232 [cs.AR], (1 Mar 2024)
@misc{li2024fttn,
title={FTTN: Feature-Targeted Testing for Numerical Properties of NVIDIA & AMD Matrix Accelerators},
author={Xinyi Li and Ang Li and Bo Fang and Katarzyna Swirydowicz and Ignacio Laguna and Ganesh Gopalakrishnan},
year={2024},
eprint={2403.00232},
archivePrefix={arXiv},
primaryClass={cs.AR}
}
NVIDIA Tensor Cores and AMD Matrix Cores (together called Matrix Accelerators) are of growing interest in high-performance computing and machine learning owing to their high performance. Unfortunately, their numerical behaviors are not publicly documented, including the number of extra precision bits maintained, the accumulation order of addition, and predictable subnormal number handling during computations. This makes it impossible to reliably port codes across these differing accelerators. This paper contributes a collection of Feature Targeted Tests for Numerical Properties that that help determine these features across five floating-point formats, four rounding modes and additional that highlight the rounding behaviors and preservation of extra precision bits. To show the practical relevance of FTTN, we design a simple matrix-multiplication test designed with insights gathered from our feature-tests. We executed this very simple test on five platforms, producing different answers: V100, A100, and MI250X produced 0, MI100 produced 255.875, and Hopper H100 produced 191.875. Our matrix multiplication tests employ patterns found in iterative refinement-based algorithms, highlighting the need to check for significant result variability when porting code across GPUs.
March 10, 2024 by hgpu