Interference-driven resource management for GPU-based heterogeneous clusters

Rajat Phull, Cheng-Hong Li, Kunal Rao, Srihari Cadambi, Srimat Chakradhar
NEC Laboratories America, Inc., Suite 200, 4 Independence Way, Princeton, NJ 08540, USA
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing (HPDC ’12), 2012


   title={Interference-driven resource management for GPU-based heterogeneous clusters},

   author={Phull, R. and Li, C.H. and Rao, K. and Cadambi, H. and Chakradhar, S.},

   booktitle={Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing},





Download Download (PDF)   View View   Source Source   



GPU-based clusters are increasingly being deployed in HPC environments to accelerate a variety of scientific applications. Despite their growing popularity, the GPU devices themselves are under-utilized even for many computationally-intensive jobs. This stems from the fact that the typical GPU usage model is one in which a host processor periodically offloads computationally intensive portions of an application to the coprocessor. Since some portions of code cannot be offloaded to the GPU (for example, code performing network communication in MPI applications), this usage model results in periods of time when the GPU is idle. GPUs could be time-shared across jobs to "fill" these idle periods, but unlike CPU resources such as the cache, the effects of sharing the GPU are not well understood. Specifically, two jobs that time-share a single GPU will experience resource contention and interfere with each other. The resulting slow-down could lead to missed job deadlines. Current cluster managers do not support GPU-sharing, but instead dedicate GPUs to a job for the job’s lifetime. In this paper, we present a framework to predict and handle interference when two or more jobs time-share GPUs in HPC clusters. Our framework consists of an analysis model, and a dynamic interference detection and response mechanism to detect excessive interference and restart the interfering jobs on different nodes. We implement our framework in Torque, an open-source cluster manager, and using real workloads on an HPC cluster, show that interference-aware two-job colocation (although our method is applicable to colocating more than two jobs) improves GPU utilization by 25%, reduces a job’s waiting time in the queue by 39% and improves job latencies by around 20%.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: