28155

Energy-Efficient GPU Clusters Scheduling for Deep Learning

Diandian Gu, Xintong Xie, Gang Huang, Xin Jin, Xuanzhe Liu
Peking University
arXiv:2304.06381 [cs.DC], (13 Apr 2023)

@misc{gu2023energyefficient,

   title={Energy-Efficient GPU Clusters Scheduling for Deep Learning},

   author={Diandian Gu and Xintong Xie and Gang Huang and Xin Jin and Xuanzhe Liu},

   year={2023},

   eprint={2304.06381},

   archivePrefix={arXiv},

   primaryClass={cs.DC}

}

Download Download (PDF)   View View   Source Source   

603

views

Training deep neural networks (DNNs) is a major workload in datacenters today, resulting in a tremendously fast growth of energy consumption. It is important to reduce the energy consumption while completing the DL training jobs early in data centers. In this paper, we propose PowerFlow, a GPU clusters scheduler that reduces the average Job Completion Time (JCT) under an energy budget. We first present performance models for DL training jobs to predict the throughput and energy consumption performance with different configurations. Based on the performance models, PowerFlow dynamically allocates GPUs and adjusts the GPU-level or job-level configurations of DL training jobs. PowerFlow applies network packing and buddy allocation to job placement, thus avoiding extra energy consumed by cluster fragmentations. Evaluation results show that under the same energy consumption, PowerFlow improves the average JCT by 1.57 – 3.39 x at most, compared to competitive baselines.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: