Scaling GPU-to-CPU Migration for Efficient Distributed Execution on CPU Clusters
Georgia Institute of Technology, Atlanta, GA, United States
31st ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP’26), 2026
@inproceedings{han2026scaling,
title={Scaling GPU-to-CPU Migration for Efficient Distributed Execution on CPU Clusters},
author={Han, Ruobing and Kim, Hyesoon},
booktitle={Proceedings of the 31st ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming},
pages={355–368},
year={2026}
}
The growing demand for GPU resources has led to widespread shortages in data centers, prompting the exploration of CPUs as an alternative for executing GPU programs. While prior research supports executing GPU programs on single CPUs, these approaches struggle to achieve competitive performance due to the computational capacity gap between GPUs and CPUs. To further improve performance, we introduce CuCC, a framework that scales GPU-to-CPU migration to CPU clusters and utilizes distributed CPU nodes to execute GPU programs. Compared to single-CPU execution, CPU cluster execution requires cross-node communication to maintain data consistency. We present the CuCC execution workflow and communication optimizations, which aim to reduce network overhead. Evaluations demonstrate that CuCC achieves high scalability on large-scale CPU clusters and delivers runtimes approaching those of GPUs. In terms of cluster-wide throughput, CuCC enables CPUs to achieve an average of 2.59x higher throughput than GPUs.
February 8, 2026 by hgpu
Your response
You must be logged in to post a comment.




