high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Analyzing Modern NVIDIA GPU cores

Analyzing Modern NVIDIA GPU cores

Rodrigo Huerta, Mojtaba Abaie Shoushtary, José-Lorenzo Cruz, Antonio González

Universitat Politècnica de Catalunya, Barcelona, Spain

arXiv:2503.20481 [cs.AR], (26 Mar 2025)

DOI:10.48550/arXiv.2503.20481

@misc{huerta2025analyzingmodernnvidiagpu,

title={Analyzing Modern NVIDIA GPU cores},

author={Rodrigo Huerta and Mojtaba Abaie Shoushtary and José-Lorenzo Cruz and Antonio González},

year={2025},

eprint={2503.20481},

archivePrefix={arXiv},

primaryClass={cs.AR},

url={https://arxiv.org/abs/2503.20481}

}

Download (PDF)

View

Source

1970

views

GPUs are the most popular platform for accelerating HPC workloads, such as artificial intelligence and science simulations. However, most microarchitectural research in academia relies on GPU core pipeline designs based on architectures that are more than 15 years old. This paper reverse engineers modern NVIDIA GPU cores, unveiling many key aspects of its design and explaining how GPUs leverage hardware-compiler techniques where the compiler guides hardware during execution. In particular, it reveals how the issue logic works including the policy of the issue scheduler, the structure of the register file and its associated cache, and multiple features of the memory pipeline. Moreover, it analyses how a simple instruction prefetcher based on a stream buffer fits well with modern NVIDIA GPUs and is likely to be used. Furthermore, we investigate the impact of the register file cache and the number of register file read ports on both simulation accuracy and performance. By modeling all these new discovered microarchitectural details, we achieve 18.24% lower mean absolute percentage error (MAPE) in execution cycles than previous state-of-the-art simulators, resulting in an average of 13.98% MAPE with respect to real hardware (NVIDIA RTX A6000). Also, we demonstrate that this new model stands for other NVIDIA architectures, such as Turing. Finally, we show that the software-based dependence management mechanism included in modern NVIDIA GPUs outperforms a hardware mechanism based on scoreboards in terms of performance and area.

Tags: Computer science, Hardware Architecture, nVidia, nVidia RTX A6000

March 30, 2025 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Analyzing Modern NVIDIA GPU cores

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Analyzing Modern NVIDIA GPU cores

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)