10731

A Dynamic Resource Management System for Network-Attached Accelerator Clusters

Suraj Prabhakaran, Mohsin Iqbal, Sebastian Rinke, Felix Wolf
German Research School for Simulation Sciences, Laboratory for Parallel Programming, 52062 Aachen, Germany
42nd International Conference on Parallel Processing Workshops (ICPPW), Workshop on Scheduling and Resource Management for Parallel and Distributed Systems (SRMPDS), 2013

@inproceedings{prabhakaran_ea:2013:dynrm_nac,

   author={Prabhakaran, Suraj and Iqbal, Mohsin and Rinke, Sebastian and Wolf, Felix},

   month={oct},

   title={A Dynamic Resource Management System for Network-Attached Accelerator Clusters},

   booktitle={Proc. of the 42nd International Conference on Parallel Processing Workshops (ICPPW), Workshop on Scheduling and Resource Management for Parallel and Distributed Systems (SRMPDS), Lyon, France},

   year={2013}

}

Download Download (PDF)   View View   Source Source   

1083

views

Over the years, cluster systems have become increasingly heterogeneous by equipping cluster nodes with one or more accelerators such as graphic processing units (GPU). These devices are typically attached to a compute node via PCI Express. As a consequence, batch systems such as TORQUE/Maui and SLURM have been extended to be aware of those additional resources tightly coupled with compute nodes. Recent advances in accelerator technology have given rise to the possibility of using network-attached accelerators in addition to node-attached accelerators. However, current batch systems do not support this new usage scenario of accelerators. This work focuses on the support for batch systems for allocating network-attached accelerators. The most important feature of the proposed batch system is its ability to dynamically allocate network-attached accelerators to jobs at application runtime. We discuss our extensions to the TORQUE and Maui batch system and elaborate on its features in the Dynamic Accelerator-Cluster Architecture, which describes an integration of network-attached accelerators into a cluster system. We also evaluate the dynamic allocation scenarios and show how batch systems can be designed to provide support for more flexible and dynamic cluster systems.
Rating: 2.5. From 1 vote.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: