high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Accurate Models of NVIDIA Tensor Cores

Accurate Models of NVIDIA Tensor Cores

Faizan A. Khattak, Mantas Mikaitis

School of Computer Science, University of Leeds, Leeds, UK

arXiv:2512.07004 [cs.MS]

DOI:10.48550/arXiv.2512.07004

@misc{khattak2025accuratemodelsnvidiatensor,

title={Accurate Models of NVIDIA Tensor Cores},

author={Faizan A. Khattak and Mantas Mikaitis},

year={2025},

eprint={2512.07004},

archivePrefix={arXiv},

primaryClass={cs.MS},

url={https://arxiv.org/abs/2512.07004}

}

Download (PDF)

View

Source

Source codes

Package:

MATLAB Tensor Core models

1571

views

Matrix multiplication is a fundamental operation in for both training of neural networks and inference. To accelerate matrix multiplication, Graphical Processing Units (GPUs) provide it implemented in hardware. Due to the increased throughput over the software-based matrix multiplication, the multipliers are increasingly used outside of AI, to accelerate various applications in scientific computing. However, matrix multipliers targeted at AI are at present not compliant with IEEE 754 floating-point arithmetic behaviour, with different vendors offering different numerical features. This leads to non-reproducible results across different generations of GPU architectures, at the matrix multiply-accumulate instruction level. To study numerical characteristics of matrix multipliers-such as rounding behaviour, accumulator width, normalization points, extra carry bits, and others-test vectors are typically constructed. Yet, these vectors may or may not distinguish between different hardware models, and due to limited hardware availability, their reliability across many different platforms remains largely untested. We present software models for emulating the inner product behavior of low- and mixed-precision matrix multipliers in the V100, A100, H100 and B200 data center GPUs in most supported input formats of interest to mixed-precision algorithm developers: 8-, 16-, and 19-bit floating point.

Tags: Computer science, CUDA, Matrix multiplication, nVidia, nVidia B200, nVidia H100, nVidia V100, Package

December 15, 2025 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Accurate Models of NVIDIA Tensor Cores

Package:

Your response

Recent source codes

Awesome LLM-Driven Kernel Generation

PhysProver: Advancing Automatic Theorem Proving for Physics

ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation

SeedFold: Scaling Biomolecular Structure Prediction

Tilus: A Tile-Level GPU Kernel Programming Language

Memory-Efficient Acceleration of Block Low-Rank Foundation Models on Resource Constrained GPUs

BoltzGen:Toward Universal Binder Design

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution

MATLAB Tensor Core models

Most viewed papers (last 30 days)

Accurate Models of NVIDIA Tensor Cores

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)