Improving the Efficiency of GPU Clusters
Platform Computing
HPC Day (High Performance Computing Day), Stanford University, 2010
@misc{mcmillan2010improving,
title={Improving the Efficiency of GPU Clusters},
author={Mcmillan, Bill},
year={2010}
}
If you perceive more than a little excitement around the topic of Graphic Processing Units (GPUs) in High-Performance Computing (HPC), it’s for pretty good reason. HPC is all about performance after all, and it’s not every day that a new technology promises an order of magnitude boost in processing power. A variety of new GPU processing elements are rapidly finding their way into HPC clusters as developers scramble to rearticulate algorithms to take maximum advantage of these new capabilities. Not only is the industry taking note of performance, but the economics are compelling too. In data centers where facilities, power and cooling dominate the cost of the compute infrastructure itself, achieving better density and performance to power ratios warrants serious attention. Achieving real-world performance is about much more than just the raw-performance of underlying hardware as we all know. Much as a highly-efficient power plant connected to a distribution network losing 70% of its power in transmission makes little sense, the same applies to HPC clusters as well – efficiency matters. While many factors impact efficiency, this paper focuses on the critical role of scheduling and workload management in getting the most out of your GPU cluster. By “working smarter”, and enabling GPU clusters with dramatically higher utilization and throughput, not only can organizations achieve savings in infrastructure and management costs, they can boost productivity as well.
March 12, 2011 by hgpu