high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Inter-APU Communication on AMD MI300A Systems via Infinity Fabric: a Deep Dive

Inter-APU Communication on AMD MI300A Systems via Infinity Fabric: a Deep Dive

Gabin Schieffer, Jacob Wahlgren, Ruimin Shi, Edgar A. León, Roger Pearce, Maya Gokhale, Ivy Peng

KTH Royal Institute of Technology, Sweden

arXiv:2508.11298 [cs.DC]

DOI:10.48550/arXiv.2508.11298

@misc{schieffer2025interapucommunicationamdmi300a,

title={Inter-APU Communication on AMD MI300A Systems via Infinity Fabric: a Deep Dive},

author={Gabin Schieffer and Jacob Wahlgren and Ruimin Shi and Edgar A. León and Roger Pearce and Maya Gokhale and Ivy Peng},

year={2025},

eprint={2508.11298},

archivePrefix={arXiv},

primaryClass={cs.DC},

url={https://arxiv.org/abs/2508.11298}

}

Download (PDF)

View

Source

1535

views

The ever-increasing compute performance of GPU accelerators drives up the need for efficient data movements within HPC applications to sustain performance. Proposed as a solution to alleviate CPU-GPU data movement, AMD MI300A Accelerated Processing Unit (APU) combines CPU, GPU, and high-bandwidth memory (HBM) within a single physical package. Leadership supercomputers, such as El Capitan, group four APUs within a single compute node, using Infinity Fabric Interconnect. In this work, we design specific benchmarks to evaluate direct memory access from the GPU, explicit inter-APU data movement, and collective multi-APU communication. We also compare the efficiency of HIP APIs, MPI routines, and the GPU-specialized RCCL library. Our results highlight key design choices for optimizing inter-APU communication on multi-APU AMD MI300A systems with Infinity Fabric, including programming interfaces, allocators, and data movement. Finally, we optimize two real HPC applications, Quicksilver and CloverLeaf, and evaluate them on a four MI100A APU system.

Tags: AMD Radeon Instinct MI100, AMD Radeon Instinct MI250, AMD Radeon Instinct MI300A, APU, ATI, Benchmarking, Computer science, HIP, HPC, MPI, Performance

August 24, 2025 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Inter-APU Communication on AMD MI300A Systems via Infinity Fabric: a Deep Dive

Your response

Recent source codes

NVIDIA Nemotron Parse 1.1

ThunderKittens: Tile primitives for speedy kernels

Iris: AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming

HipKittens: Fast and Furious AMD Kernels

Fortran xDSL dialects

mt4g: Memory Topology 4 GPUs

Falcon: GPU-Based Floating-point Adaptive Lossless Compression

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

LC Framework

pplx-garden: Perplexity open source garden for inference technology

Most viewed papers (last 30 days)

Inter-APU Communication on AMD MI300A Systems via Infinity Fabric: a Deep Dive

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)