29154

Predicting GPUDirect Benefits for HPC Workloads

Harsh Khetawat, Nikhil Jain, Abhinav Bhatele, Frank Mueller
Department of Computer Science, North Carolina State University
The 32nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP’24), 2024

@inproceedings{khetawat:pdp2024,

   author={Khetawat, Harsh and Jain, Nikhil and Bhatele, Abhinav and Mueller, Frank},

   title={Predicting {GPUD}irect Benefits for {HPC} Workloads},

   booktitle={Proceedings of the 32nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing},

   series={PDP ’24},

   year={2024},

   month={mar}

}

Download Download (PDF)   View View   Source Source   

261

views

Graphics processing units (GPUs) are becoming increasingly popular in modern HPC systems. Hardware for data movement to and from GPUs such as NVLink and GPUDirect has reduced latencies, increased throughput, and eliminated redundant copies. In this work, we use discrete event simulations to explore the impact of different communication paradigms on the messaging performance of scientific applications running on multi-GPU nodes. First, we extend an existing simulation framework to model data movement on GPU-based clusters. Second, we implement support for the simulation of communication paradigms such as GPUDirect with point-to-point messages and collectives. Finally, we study the impact of different parameters on communication performance such as the number of GPUs per node and GPUDirect. We validate the framework and then simulate traces from GPU-enabled applications on a fat-tree based cluster. Simulation results uncover strengths but also weaknesses of GPUDirect depending on the application and their usage of communication primitives.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: