Merge or Separate? Multi-job Scheduling for OpenCL Kernels on CPU/GPU Platforms
The University of Edinburgh
Workshop about general purpose processing using GPUs (GPGPU-10), 2017
@inproceedings{wen2017merge,
title={Merge or Separate?: Multi-job Scheduling for OpenCL Kernels on CPU/GPU Platforms},
author={Wen, Yuan and O’Boyle, Michael FP},
booktitle={Proceedings of the General Purpose GPUs},
pages={22–31},
year={2017},
organization={ACM}
}
Computer systems are increasingly heterogeneous with nodes consisting of CPUs and GPU accelerators. As such systems become mainstream, they move away from specialized highperformance single application platforms to a more general setting with multiple, concurrent, application jobs. Determining how jobs should be dynamically best scheduled to heterogeneous devices is non-trivial. In certain cases, performance is maximized if jobs are allocated to a single device, in others, sharing is preferable. In this paper, we present a runtime framework which schedules multi-user OpenCL tasks to their most suitable device in a CPU/GPU system. We use a machine learning-based predictive model at runtime to detect whether to merge OpenCL kernels or schedule them separately to the most appropriate devices without the need for ahead-of-time profiling. We evaluate out approach over a wide range of workloads, on two separate platforms. We consistently show significant performance and turn-around time improvement over the state-of-the-art across programs, workload, and platforms.
April 3, 2017 by hgpu