Understanding Data Movement in AMD Multi-GPU Systems with Infinity Fabric
KTH Royal Institute of Technology, Stockholm, Sweden
arXiv:2410.00801 [cs.DC], (1 Oct 2024)
@misc{schieffer2024understandingdatamovementamd,
title={Understanding Data Movement in AMD Multi-GPU Systems with Infinity Fabric},
author={Gabin Schieffer and Ruimin Shi and Stefano Markidis and Andreas Herten and Jennifer Faj and Ivy Peng},
year={2024},
eprint={2410.00801},
archivePrefix={arXiv},
primaryClass={cs.DC},
url={https://arxiv.org/abs/2410.00801}
}
Modern GPU systems are constantly evolving to meet the needs of computing-intensive applications in scientific and machine learning domains. However, there is typically a gap between the hardware capacity and the achievable application performance. This work aims to provide a better understanding of the Infinity Fabric interconnects on AMD GPUs and CPUs. We propose a test and evaluation methodology for characterizing the performance of data movements on multi-GPU systems, stressing different communication options on AMD MI250X GPUs, including point-to-point and collective communication, and memory allocation strategies between GPUs, as well as the host CPU. In a single-node setup with four GPUs, we show that direct peer-to-peer memory accesses between GPUs and utilization of the RCCL library outperform MPI-based solutions in terms of memory/communication latency and bandwidth. Our test and evaluation method serves as a base for validating memory and communication strategies on a system and improving applications on AMD multi-GPU computing systems.
October 6, 2024 by hgpu