high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Can Tensor Cores Benefit Memory-Bound Kernels? (No!)

Can Tensor Cores Benefit Memory-Bound Kernels? (No!)

Lingqi Zhang, Jiajun Huang, Sheng Di, Satoshi Matsuoka, Mohamed Wahib

RIKEN Center for Computational Science, Japan

arXiv:2502.16851 [cs.DC]

DOI:10.48550/arXiv.2502.16851

@misc{zhang2025tensorcoresbenefitmemorybound,

title={Can Tensor Cores Benefit Memory-Bound Kernels? (No!)},

author={Lingqi Zhang and Jiajun Huang and Sheng Di and Satoshi Matsuoka and Mohamed Wahib},

year={2025},

eprint={2502.16851},

archivePrefix={arXiv},

primaryClass={cs.DC},

url={https://arxiv.org/abs/2502.16851}

}

Download (PDF)

View

Source

867

views

Tensor cores are specialized processing units within GPUs that have demonstrated significant efficiency gains in compute-bound applications such as Deep Learning Training by accelerating dense matrix operations. Given their success, researchers have attempted to extend tensor core capabilities beyond dense matrix computations to other computational patterns, including memory-bound kernels. Recent studies have reported that tensor cores can outperform traditional CUDA cores even on memory-bound kernels, where the primary performance bottleneck is not computation. In this research, we challenge these findings through both theoretical and empirical analysis. Our theoretical analysis reveals that tensor cores can achieve a maximum speedup of only 1.33x over CUDA cores for memory-bound kernels in double precision (for V100, A100, and H100 GPUs). We validate this theoretical limit through empirical analysis of three representative memory-bound kernels-STREAM Scale, SpMV, and stencil. We demonstrate that optimizing memory-bound kernels using tensor cores does not yield sound performance improvements over CUDA cores.

Tags: Computer science, CUDA, nVidia, nVidia A100, Performance

March 10, 2025 by hgpu

No votes yet.

Please wait...