high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Execution-Centric Characterization of FP8 Matrix Cores, Asynchronous Execution, and Structured Sparsity on AMD MI300A

Execution-Centric Characterization of FP8 Matrix Cores, Asynchronous Execution, and Structured Sparsity on AMD MI300A

Aaron Jarmusch, Connor Vitz, Sunita Chandrasekaran

University of Delaware, Newark, Delaware, USA

arXiv:2602.10262 [cs.DC], (10 Feb 2026)

DOI:10.48550/arXiv.2602.10262

@misc{jarmusch2026executioncentric,

title={Execution-Centric Characterization of FP8 Matrix Cores, Asynchronous Execution, and Structured Sparsity on AMD MI300A},

author={Aaron Jarmusch and Connor Vitz and Sunita Chandrasekaran},

year={2026},

eprint={2602.10262},

archivePrefix={arXiv},

primaryClass={cs.DC},

url={https://arxiv.org/abs/2602.10262}

}

Download (PDF)

View

Source

934

views

The AMD MI300A APU integrates CDNA3 GPUs with high-bandwidth memory and advanced accelerator features: FP8 matrix cores, asynchronous compute engines (ACE), and 2:4 structured sparsity. These capabilities are increasingly relied upon by modern HPC and HPC-AI workloads, yet their execution characteristics and system-level implications remain insufficiently understood. In this paper, we present an execution-centric characterization of FP8 matrix execution, ACE concurrency, and structured sparsity on MI300A using targeted microbenchmarks. We quantify occupancy thresholds, fairness, throughput trade-offs under concurrent execution, and context-dependent sparsity benefits. We evaluate representative case studies – transformer-style, concurrent, and mixed-precision kernels – to show how these effects translate into application-level performance and predictability. Our results provide practical guidance for occupancy-aware scheduling, concurrency decisions, and sparsity enablement on MI300A-class unified nodes.

Tags: AMD, AMD Radeon Instinct MI300A, Benchmarking, Computer science, HIP, Performance, ROCm

February 16, 2026 by hgpu

No votes yet.

Please wait...