high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Merge or Separate? Multi-job Scheduling for OpenCL Kernels on CPU/GPU Platforms

Merge or Separate? Multi-job Scheduling for OpenCL Kernels on CPU/GPU Platforms

Yuan Wen, Michael F.P. O’Boyle

The University of Edinburgh

Workshop about general purpose processing using GPUs (GPGPU-10), 2017

BibTeX

Download (PDF)

View

Source

2064

views

Computer systems are increasingly heterogeneous with nodes consisting of CPUs and GPU accelerators. As such systems become mainstream, they move away from specialized highperformance single application platforms to a more general setting with multiple, concurrent, application jobs. Determining how jobs should be dynamically best scheduled to heterogeneous devices is non-trivial. In certain cases, performance is maximized if jobs are allocated to a single device, in others, sharing is preferable. In this paper, we present a runtime framework which schedules multi-user OpenCL tasks to their most suitable device in a CPU/GPU system. We use a machine learning-based predictive model at runtime to detect whether to merge OpenCL kernels or schedule them separately to the most appropriate devices without the need for ahead-of-time profiling. We evaluate out approach over a wide range of workloads, on two separate platforms. We consistently show significant performance and turn-around time improvement over the state-of-the-art across programs, workload, and platforms.

Tags: AMD Radeon HD 7970, ATI, Computer science, Heterogeneous systems, Machine learning, nVidia, nVidia GeForce GTX 780, OpenCL

April 3, 2017 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Merge or Separate? Multi-job Scheduling for OpenCL Kernels on CPU/GPU Platforms

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

Merge or Separate? Multi-job Scheduling for OpenCL Kernels on CPU/GPU Platforms

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)