30145

Inter-APU Communication on AMD MI300A Systems via Infinity Fabric: a Deep Dive

Gabin Schieffer, Jacob Wahlgren, Ruimin Shi, Edgar A. León, Roger Pearce, Maya Gokhale, Ivy Peng
KTH Royal Institute of Technology, Sweden
arXiv:2508.11298 [cs.DC]

@misc{schieffer2025interapucommunicationamdmi300a,

   title={Inter-APU Communication on AMD MI300A Systems via Infinity Fabric: a Deep Dive},

   author={Gabin Schieffer and Jacob Wahlgren and Ruimin Shi and Edgar A. León and Roger Pearce and Maya Gokhale and Ivy Peng},

   year={2025},

   eprint={2508.11298},

   archivePrefix={arXiv},

   primaryClass={cs.DC},

   url={https://arxiv.org/abs/2508.11298}

}

Download Download (PDF)   View View   Source Source   

870

views

The ever-increasing compute performance of GPU accelerators drives up the need for efficient data movements within HPC applications to sustain performance. Proposed as a solution to alleviate CPU-GPU data movement, AMD MI300A Accelerated Processing Unit (APU) combines CPU, GPU, and high-bandwidth memory (HBM) within a single physical package. Leadership supercomputers, such as El Capitan, group four APUs within a single compute node, using Infinity Fabric Interconnect. In this work, we design specific benchmarks to evaluate direct memory access from the GPU, explicit inter-APU data movement, and collective multi-APU communication. We also compare the efficiency of HIP APIs, MPI routines, and the GPU-specialized RCCL library. Our results highlight key design choices for optimizing inter-APU communication on multi-APU AMD MI300A systems with Infinity Fabric, including programming interfaces, allocators, and data movement. Finally, we optimize two real HPC applications, Quicksilver and CloverLeaf, and evaluate them on a four MI100A APU system.
No votes yet.
Please wait...

You must be logged in to post a comment.

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: