LiteGD: Lightweight and dynamic GPU Dispatching for Large-scale Heterogeneous Clusters

hgpu.org » Applications » Computer science » LiteGD: Lightweight and dynamic GPU Dispatching for Large-scale Heterogeneous Clusters

LiteGD: Lightweight and dynamic GPU Dispatching for Large-scale Heterogeneous Clusters

Kunming Zhang, Hanlong Liao, Guoming Tang

The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, Guangdong, China

arXiv:2506.15595 [cs.DC], (18 Jun 2025)

DOI:10.48550/arXiv.2506.15595

BibTeX

Download (PDF)

View

Source

461

views

Parallel computing with multiple GPUs has become the dominant paradigm for machine learning tasks, especially those of large language models (LLMs). To reduce the latency incurred by inter-GPU communication, a common practice for parallel tasks has been to allocate GPUs based on their physical proximity. However, this long-standing assumption has notable limitations, particularly in large-scale, heterogeneous GPU clusters where bandwidth distribution among GPUs is irregular. In this paper, we introduce LiteGD, a lightweight and dynamic GPU dispatching system based on global perspectives. To tackle the difficulty of storing massive GPU topology information, LiteGD adopts a computation-aware design that leverages a lightweight Transformer network trained on sampled data. Our customized design for network structure ensures both transferability and scalability. LiteGD also employs a bidirectional tree search approach to find the optimal GPU dispatching in the data generated in the previous step, which can identify near-optimal solutions while reducing search overhead. We implement and evaluate LiteGD in both real and simulated GPU clusters with homogeneous and heterogeneous interconnects, respectively. Experimental results demonstrate that LiteGD consistently achieves high GPU bandwidth efficacy (approximately 90%) across various cluster configurations and 80% in real-world H100 cluster, significantly outperforming conventional default and interconnect topology-aware dispatching methods, particularly in large-scale heterogeneous environments.

Tags: Computer science, GPU cluster, Heterogeneous systems, Machine learning, nVidia, nVidia A800, nVidia GeForce RTX 4090, nVidia H100, nVidia RTX A6000, nVidia V100

June 22, 2025 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org