high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A Study of Floating-Point Precision Tuning in Deep Learning Operators Implementations

A Study of Floating-Point Precision Tuning in Deep Learning Operators Implementations

Zhongzhen Wen, Hongyu Liu, Tingwei Zhu, Minxue Pan, Shaohua Wang, Yuanyi Lin, Kairui Liu, Tian Zhang, Xuandong Li

State Key Laboratory for Novel Software Technology, Nanjing University, China

ACM Transactions on Software Engineering and Methodology, 2025

DOI:10.1145/3773992

@article{wen2025study,

title={A Study of Floating-Point Precision Tuning in Deep Learning Operators Implementations},

author={Wen, Zhongzhen and Liu, Hongyu and Zhu, Tingwei and Pan, Minxue and Wang, Shaohua and Lin, Yuanyi and Liu, Kairui and Zhang, Tian and Li, Xuandong},

journal={ACM Transactions on Software Engineering and Methodology},

year={2025},

publisher={ACM New York, NY}

}

Download (PDF)

View

Source

Source codes

Package:

OpScanner

286

views

Deep learning (DL) has already played a significant role in numerous fields, making it crucial to ensure the stability of both training and inference in DL systems. The computation of DL models can be viewed as the execution of a series of DL operators, which are essential components that perform the core numerical computations. Therefore, the numerical stability of these operators is critical for the overall quality of these systems. Operators with numerical instabilities can cause degraded model performance or even numerical exceptions that disrupt the training process. A common practice to improve numerical stability is increasing the precision of specific floating-point computations. However, determining the appropriate data type for internal computations, referred to as precision tuning, remains a significant challenge. This paper presents the first systematic study on precision tuning in the implementation of DL operators. We introduce OpScanner, capable of extracting precision casting operations critical to numerical stability of operators within existing DL frameworks. Using the captured operations, we conduct a comprehensive categorization and comparative analysis between two popular DL frameworks. Our findings can help developers understand precision tuning in operators and implement new operators. Based on the results of our comparative analysis, we conduct precision testing experiments on a set of operators from both frameworks to examine the impact of precision tuning. The test results show that tested TensorFlow operators produce significantly more floating-point errors compared to their PyTorch counterparts due to a lack of precision tuning. One of the most inaccurate operators has been confirmed to have numerical issues by the TensorFlow community. Additionally, our study examines the effect of precision tuning on weight updates during training. We found that larger errors, resulting from a lack of tuning, lead to greater weight deviations during training.

Tags: Computer science, CUDA, Deep learning, nVidia, Package, PyTorch

November 2, 2025 by hgpu

No votes yet.

Please wait...