high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Orchestrating Multiple Data-Parallel Kernels on Multiple Devices

Orchestrating Multiple Data-Parallel Kernels on Multiple Devices

Janghaeng Lee, Mehrzad Samadi, Scott Mahlke

University of Michigan, Ann Arbor

24th International Conference on Parallel Architectures and Compilation Techniques (PACT), 2015

BibTeX

Download (PDF)

View

Source

2582

views

Traditionally, programmers and software tools have focused on mapping a single data-parallel kernel onto a heterogeneous computing system consisting of multiple general-purpose processors (CPUS) and graphics processing units (GPUs). These methodologies break down as application complexity grows to contain multiple communicating data-parallel kernels. This paper introduces MKMD, an automatic system for mapping multiple kernels across multiple computing devices in a seamless manner. MKMD is a two phased approach that combines coarse grain scheduling of indivisible kernels followed by opportunistic fine-grained workgroup-level partitioning to exploit idle resources. During this process, MKMD considers kernel dependencies and the underlying systems along with the execution time model built with a few sets of profile data. With the scheduling decision, MKMD transparently manages the order of executions and data transfers for each device. On a real machine with one CPU and two different GPUs, MKMD achieves a mean speedup of 1.89x compared to the in-order execution on the fastest device for a set of applications with multiple kernels. 52% of this speedup comes from the coarse-grained scheduling and the other 48% is the result of the fine-grained partitioning.

Tags: Computer science, Heterogeneous systems, nVidia, nVidia GeForce GTX 750, nVidia GeForce GTX 760, OpenCL

November 29, 2015 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Orchestrating Multiple Data-Parallel Kernels on Multiple Devices

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

Orchestrating Multiple Data-Parallel Kernels on Multiple Devices

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)