Towards Calculating HPC CUDA Kernel Performance on Nvidia GPUs
ETH Zurich
ETH Zurich, 2025
@article{manatschal2025towards,
title={Towards Calculating HPC CUDA Kernel Performance on Nvidia GPUs},
author={Manatschal, Dumeni},
year={2025},
publisher={ETH Zurich}
}
This thesis aims at providing the ground work to facilitate a performance estimation model for CUDA kernels using a cycle counting model. After a short overview of past GPU performance modeling techniques, it conducts an exhaustive, in-depth analysis of Nvidia’s SASS instruction set and CUDA ELF formats for architectures Maxwell up to and including Blackwell, facilitating deep insight into Nvidia’s SASS instruction format, enabling precise microbenchmarking based on SASS instructions only, while utilizing Python as a tool. Finally, in addition to a VSCode extension featuring a precise, in-depth visualization to a precise, custom CUDA kernel disassembler, it provides insights into Nvidia’s SASS instruction scheduling and barrier mechanisms and a series of tutorials, jumpstarting understanding of SASS and a concrete proposal for a Cycle Counting Model using data that can be provided by the techniques presented in this thesis.
September 14, 2025 by hgpu
Your response
You must be logged in to post a comment.