HetCCL: Accelerating LLM Training with Heterogeneous GPUs
Seoul National University
arXiv:2601.22585 [cs.DC], (30 Jan 2026)
@misc{kim2026hetccl,
title={HetCCL: Accelerating LLM Training with Heterogeneous GPUs},
author={Heehoon Kim and Jaehwan Lee and Taejeoung Kim and Jongwon Park and Jinpyo Kim and Pyongwon Suh and Ryan H. Choi and Sangwoo Lee and Jaejin Lee},
year={2026},
eprint={2601.22585},
archivePrefix={arXiv},
primaryClass={cs.DC},
url={https://arxiv.org/abs/2601.22585}
}
The rapid growth of large language models is driving organizations to expand their GPU clusters, often with GPUs from multiple vendors. However, current deep learning frameworks lack support for collective communication across heterogeneous GPUs, leading to inefficiency and higher costs. We present HetCCL, a collective communication library that unifies vendor-specific backends and enables RDMA-based communication across GPUs without requiring driver modifications. HetCCL introduces two novel mechanisms that enable cross-vendor communication while leveraging optimized vendor libraries, NVIDIA NCCL and AMD RCCL. Evaluations on a multi-vendor GPU cluster show that HetCCL matches NCCL and RCCL performance in homogeneous setups while uniquely scaling in heterogeneous environments, enabling practical, high-performance training with both NVIDIA and AMD GPUs without changes to existing deep learning applications.
February 8, 2026 by hgpu
Your response
You must be logged in to post a comment.




