high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Orchestrating Multiple Data-Parallel Kernels on Multiple Devices

Orchestrating Multiple Data-Parallel Kernels on Multiple Devices

Janghaeng Lee, Mehrzad Samadi, Scott Mahlke

University of Michigan, Ann Arbor

24th International Conference on Parallel Architectures and Compilation Techniques (PACT), 2015

@article{lee2015orchestrating,

title={Orchestrating Multiple Data-Parallel Kernels on Multiple Devices},

author={Lee, Janghaeng and Samadi, Mehrzad and Mahlke, Scott},

year={2015}

}

Download (PDF)

View

Source

2159

views

Traditionally, programmers and software tools have focused on mapping a single data-parallel kernel onto a heterogeneous computing system consisting of multiple general-purpose processors (CPUS) and graphics processing units (GPUs). These methodologies break down as application complexity grows to contain multiple communicating data-parallel kernels. This paper introduces MKMD, an automatic system for mapping multiple kernels across multiple computing devices in a seamless manner. MKMD is a two phased approach that combines coarse grain scheduling of indivisible kernels followed by opportunistic fine-grained workgroup-level partitioning to exploit idle resources. During this process, MKMD considers kernel dependencies and the underlying systems along with the execution time model built with a few sets of profile data. With the scheduling decision, MKMD transparently manages the order of executions and data transfers for each device. On a real machine with one CPU and two different GPUs, MKMD achieves a mean speedup of 1.89x compared to the in-order execution on the fastest device for a set of applications with multiple kernels. 52% of this speedup comes from the coarse-grained scheduling and the other 48% is the result of the fine-grained partitioning.

Tags: Computer science, Heterogeneous systems, nVidia, nVidia GeForce GTX 750, nVidia GeForce GTX 760, OpenCL

November 29, 2015 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Orchestrating Multiple Data-Parallel Kernels on Multiple Devices

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Orchestrating Multiple Data-Parallel Kernels on Multiple Devices

Share this:

Recent source codes

Most viewed papers (last 30 days)