29461

Testing GPU Numerics: Finding Numerical Differences Between NVIDIA and AMD GPUs

Anwar Hossain Zahid, Ignacio Laguna, Wei Le
Department of Computer Science, Iowa State University, Ames, IA
arXiv:2410.09172 [math.NA], (11 Oct 2024)

@misc{zahid2024testinggpunumericsfinding,

   title={Testing GPU Numerics: Finding Numerical Differences Between NVIDIA and AMD GPUs},

   author={Anwar Hossain Zahid and Ignacio Laguna and Wei Le},

   year={2024},

   eprint={2410.09172},

   archivePrefix={arXiv},

   primaryClass={math.NA},

   url={https://arxiv.org/abs/2410.09172}

}

Download Download (PDF)   View View   Source Source   

799

views

As scientific codes are ported between GPU platforms, continuous testing is required to ensure numerical robustness and identify numerical differences. Compiler-induced numerical differences occur when a program is compiled and run on different GPUs, and the numerical outcomes are different for the same input. We present a study of compiler-induced numerical differences between NVIDIA and AMD GPUs. Our approach uses Varity to generate thousands of short numerical tests in CUDA and HIP, and their inputs; then, we use differential testing to check if the program produced a numerical inconsistency when run on these GPUs. We also use the HIPIFY tool to convert CUDA tests into HIP and check if there are numerical inconsistencies induced by HIPIFY. We generated more than 600,000 tests and found subtle numerical differences that come from (1) math library calls, (2) differences in floating-point precision (FP64 versus FP32), and (3) converting code with HIPIFY.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: