29133

Sustainable Supercomputing for AI: GPU Power Capping at HPC Scale

Dan Zhao, Siddharth Samsi, Joseph McDonald, Baolin Li, David Bestor, Michael Jones, Devesh Tiwari, Vijay Gadepally
MIT
arXiv:2402.18593 [cs.AR], (25 Feb 2024)

@inproceedings{Zhao_2023,

   series={SoCC ’23},

   title={Sustainable Supercomputing for AI: GPU Power Capping at HPC Scale},

   url={http://dx.doi.org/10.1145/3620678.3624793},

   DOI={10.1145/3620678.3624793},

   booktitle={Proceedings of the 2023 ACM Symposium on Cloud Computing},

   publisher={ACM},

   author={Zhao, Dan and Samsi, Siddharth and McDonald, Joseph and Li, Baolin and Bestor, David and Jones, Michael and Tiwari, Devesh and Gadepally, Vijay},

   year={2023},

   month={oct},

   collection={SoCC ’23}

}

Download Download (PDF)   View View   Source Source   

807

views

As research and deployment of AI grows, the computational burden to support and sustain its progress inevitably does too. To train or fine-tune state-of-the-art models in NLP, computer vision, etc., some form of AI hardware acceleration is virtually a requirement. Recent large language models require considerable resources to train and deploy, resulting in significant energy usage, potential carbon emissions, and massive demand for GPUs and other hardware accelerators. However, this surge carries large implications for energy sustainability at the HPC/datacenter level. In this paper, we study the aggregate effect of power-capping GPUs on GPU temperature and power draw at a research supercomputing center. With the right amount of power-capping, we show significant decreases in both temperature and power draw, reducing power consumption and potentially improving hardware life-span with minimal impact on job performance. While power-capping reduces power draw by design, the aggregate system-wide effect on overall energy consumption is less clear; for instance, if users notice job performance degradation from GPU power-caps, they may request additional GPU-jobs to compensate, negating any energy savings or even worsening energy consumption. To our knowledge, our work is the first to conduct and make available a detailed analysis of the effects of GPU power-capping at the supercomputing scale. We hope our work will inspire HPCs/datacenters to further explore, evaluate, and communicate the impact of power-capping AI hardware accelerators for more sustainable AI.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: