A power-aware symbiotic scheduling algorithm for concurrent GPU kernels
NSF Center for High-Performance Reconfigurable Computing (CHREC), Department of Electrical and Computer Engineering, The George Washington University, Washington, DC, USA
The 21st IEEE International Conference on Parallel and Distributed Systems, 2015
@inproceedings{li2015power,
title={A power-aware symbiotic scheduling algorithm for concurrent gpu kernels},
author={Li, Teng and Narayana, Vikram K and El-Ghazawi, Tarek},
booktitle={The 21st IEEE International Conference on Parallel and Distributed Systems (ICPADS 2015), IEEE},
year={2015}
}
The past several years have witnessed significant performance improvements in High-Performance Computing (HPC), due to the incorporation of GPUs as co-processors. On one hand, GPU devices are growing significantly in terms of the available number of cores and the memory hierarchy; as a result, effective utilization of the available GPU resources while limiting the system power consumption has become an issue of rising importance. On the other hand, GPU vendors are providing additional supporting features to make this easier, such as enabling concurrent execution of multiple kernels, and providing on-board power sensors that can accessed through software. Amidst these new developments, we are faced with new opportunities for efficiently scheduling GPU computational kernels under performance and power constraints. In this paper, we propose a power-aware scheduling technique that carries out both performance and power optimizations for concurrent GPU kernels. We have observed that for GPU kernels that are deployed for concurrent execution, the order in which the programmer specifies their invocation can significantly alter the execution time and the power draw. We attribute this behavior to the relative synergy (or lack thereof) among kernels that are launched within close proximity of each other. Accordingly, we define performance metrics for computing the extent to which kernels are symbiotic, as well as power metrics for reducing the overall power consumption. Both metrics are estimated by modeling the kernels’ complementary resource requirements and execution characteristics. We then propose a power-aware symbiotic scheduling algorithm to obtain a concurrent kernel launch schedule with improved performance and reduced power consumption. Experimental studies are conducted on the Cray XK7 supercomputer with an NVIDIA K20 GPU in each node. The results demonstrate the efficacy of the proposed algorithm-based approach, which can be readily adopted by programmers with minimal programming effort and risk.
February 1, 2016 by hgpu