30190

Towards Calculating HPC CUDA Kernel Performance on Nvidia GPUs

Dumeni Manatschal
ETH Zurich
ETH Zurich, 2025

@article{manatschal2025towards,

   title={Towards Calculating HPC CUDA Kernel Performance on Nvidia GPUs},

   author={Manatschal, Dumeni},

   year={2025},

   publisher={ETH Zurich}

}

Download Download (PDF)   View View   Source Source   

342

views

This thesis aims at providing the ground work to facilitate a performance estimation model for CUDA kernels using a cycle counting model. After a short overview of past GPU performance modeling techniques, it conducts an exhaustive, in-depth analysis of Nvidia’s SASS instruction set and CUDA ELF formats for architectures Maxwell up to and including Blackwell, facilitating deep insight into Nvidia’s SASS instruction format, enabling precise microbenchmarking based on SASS instructions only, while utilizing Python as a tool. Finally, in addition to a VSCode extension featuring a precise, in-depth visualization to a precise, custom CUDA kernel disassembler, it provides insights into Nvidia’s SASS instruction scheduling and barrier mechanisms and a series of tutorials, jumpstarting understanding of SASS and a concrete proposal for a Cycle Counting Model using data that can be provided by the techniques presented in this thesis.
No votes yet.
Please wait...

You must be logged in to post a comment.

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: