Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling

hgpu.org » Applications » Computer science » Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling

Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling

Jianlong Zhong, Bingsheng He

School of Computer Engineering, Nanyang Technological University, Singapore, 639798

arXiv:1303.5164 [cs.DC], (21 Mar 2013)

BibTeX

Download (PDF)

View

Source

2549

views

Graphics processors, or GPUs, have recently been widely used as accelerators in the shared environments such as clusters and clouds. In such shared environments, many kernels are submitted to GPUs from different users, and throughput is an important metric for performance and total ownership cost. Despite the recently improved runtime support for concurrent GPU kernel executions, the GPU can be severely underutilized, resulting in suboptimal throughput. In this paper, we propose Kernelet, a runtime system with dynamic slicing and scheduling techniques to improve the throughput of concurrent kernel executions on the GPU. With slicing, Kernelet divides a GPU kernel into multiple sub-kernels (namely slices). Each slice has tunable occupancy to allow co-scheduling with other slices and to fully utilize the GPU resources. We develop a novel and effective Markov chain based performance model to guide the scheduling decision. Our experimental results demonstrate up to 31.1% and 23.4% performance improvement on NVIDIA Tesla C2050 and GTX680 GPUs, respectively.

Tags: Computer science, CUDA, nVidia, nVidia GeForce GTX 680, PTX, Task scheduling, Tesla C2050

March 23, 2013 by hgpu

Rating: 2.4/5. From 11 votes.

Please wait...

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org