high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » A power-aware symbiotic scheduling algorithm for concurrent GPU kernels

A power-aware symbiotic scheduling algorithm for concurrent GPU kernels

Teng Li, Vikram K. Narayana, Tarek El-Ghazawi

NSF Center for High-Performance Reconfigurable Computing (CHREC), Department of Electrical and Computer Engineering, The George Washington University, Washington, DC, USA

The 21st IEEE International Conference on Parallel and Distributed Systems, 2015

@inproceedings{li2015power,

title={A power-aware symbiotic scheduling algorithm for concurrent gpu kernels},

author={Li, Teng and Narayana, Vikram K and El-Ghazawi, Tarek},

booktitle={The 21st IEEE International Conference on Parallel and Distributed Systems (ICPADS 2015), IEEE},

year={2015}

}

Download (PDF)

View

Source

2247

views

The past several years have witnessed significant performance improvements in High-Performance Computing (HPC), due to the incorporation of GPUs as co-processors. On one hand, GPU devices are growing significantly in terms of the available number of cores and the memory hierarchy; as a result, effective utilization of the available GPU resources while limiting the system power consumption has become an issue of rising importance. On the other hand, GPU vendors are providing additional supporting features to make this easier, such as enabling concurrent execution of multiple kernels, and providing on-board power sensors that can accessed through software. Amidst these new developments, we are faced with new opportunities for efficiently scheduling GPU computational kernels under performance and power constraints. In this paper, we propose a power-aware scheduling technique that carries out both performance and power optimizations for concurrent GPU kernels. We have observed that for GPU kernels that are deployed for concurrent execution, the order in which the programmer specifies their invocation can significantly alter the execution time and the power draw. We attribute this behavior to the relative synergy (or lack thereof) among kernels that are launched within close proximity of each other. Accordingly, we define performance metrics for computing the extent to which kernels are symbiotic, as well as power metrics for reducing the overall power consumption. Both metrics are estimated by modeling the kernels’ complementary resource requirements and execution characteristics. We then propose a power-aware symbiotic scheduling algorithm to obtain a concurrent kernel launch schedule with improved performance and reduced power consumption. Experimental studies are conducted on the Cray XK7 supercomputer with an NVIDIA K20 GPU in each node. The results demonstrate the efficacy of the proposed algorithm-based approach, which can be readily adopted by programmers with minimal programming effort and risk.

Tags: Algorithms, Computer science, CUDA, nVidia, Performance, Task scheduling, Tesla K20

February 1, 2016 by hgpu

Rating: 2.5/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

* * *

high performance computing on graphics processing units: hgpu.org

A power-aware symbiotic scheduling algorithm for concurrent GPU kernels

Your response

Recent source codes

CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

CUDABench: Benchmarking LLMs for Text-to-CUDA Generation

CL4SE: A Context Learning Benchmark For Software Engineering Tasks

CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Vortex-Optimized Light-weight Toolchain (VOLT)

SciDef: Automated Definition Extraction from Scientific Literature

bioagent-bench: Benchmark for evaluating LLM agents in bioinformatics

Most viewed papers (last 30 days)

A power-aware symbiotic scheduling algorithm for concurrent GPU kernels

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)