Smart Multi-Task Scheduling for OpenCL Programs on CPU/GPU Heterogeneous Platforms

hgpu.org » Applications » Computer science » Smart Multi-Task Scheduling for OpenCL Programs on CPU/GPU Heterogeneous Platforms

Smart Multi-Task Scheduling for OpenCL Programs on CPU/GPU Heterogeneous Platforms

Yuan Wen, Zheng Wang, Michael F.P. O’Boyle

School of Informatics, The University of Edinburgh

The 21st annual IEEE International Conference on High Performance Computing (HiPC 2014), 2014

@article{wen2014smart,

title={Smart Multi-Task Scheduling for OpenCL Programs on CPU/GPU Heterogeneous Platforms},

author={Wen, Yuan and Wang, Zheng and O’Boyle, Michael F.P.},

year={2014}

}

Download (PDF)

View

Source

2522

views

Heterogeneous systems consisting of multiple CPUs and GPUs are increasingly attractive as platforms for high performance computing. Such platforms are usually programmed using OpenCL which provides program portability by allowing the same program to execute on different types of device. As such systems become more mainstream, they will move from application dedicated devices to platforms that need to support multiple concurrent user applications. Here there is a need to determine when and where to map different applications so as to best utilize the available heterogeneous hardware resources. In this paper, we present an efficient OpenCL task scheduling scheme which schedules multiple kernels from multiple programs on CPU/GPU heterogeneous platforms. It does this by determining at runtime which kernels are likely to best utilize a device. We show that speedup is a good scheduling priority function and develop a novel model that predicts a kernel’s speedup based on its static code structure. Our scheduler uses this prediction and runtime input data size to prioritize and schedule tasks. This technique is applied to a large set of concurrent OpenCL kernels. We evaluated our approach for system throughput and average turn-around time against competitive techniques on two different platforms: a Core i7/Nvidia GTX590 and a Core i7/AMD Tahiti 7970 platforms. For system throughput, we achieve, on average, a 1.21x and 1.25x improvement over the best competitors on the NVIDIA and AMD platforms respectively. Our approach reduces the turnaround time, on average, by at least 1.5x and 1.2x on the NVIDIA and AMD platforms respectively, when compared to alternative approaches.

Tags: ATI, ATI Radeon HD 7970, Computer science, Heterogeneous systems, Machine learning, nVidia, nVidia GeForce GTX 590, OpenCL, Task scheduling

September 17, 2014 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org