high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Free Launch: Optimizing GPU Dynamic Kernel Launches through Thread Reuse

Free Launch: Optimizing GPU Dynamic Kernel Launches through Thread Reuse

Guoyang Chen, Xipeng Shen

Computer Science Department, North Carolina State University, 890 Oval Drive, Raleigh, NC, USA 27695

The 48th Annual IEEE/ACM International Symposium on Microarchitecture, 2015

@article{chen2015free,

title={Free Launch: Optimizing GPU Dynamic Kernel Launches through Thread Reuse},

author={Chen, Guoyang and Shen, Xipeng},

year={2015}

}

Download (PDF)

View

Source

2482

views

Supporting dynamic parallelism is important for GPU to benefit a broad range of applications. There are currently two fundamental ways for programs to exploit dynamic parallelism on GPU: a software-based approach with software-managed worklists, and a hardware-based approach through dynamic subkernel launches. Neither is satisfactory. The former is complicated to program and is often subject to some load imbalance; the latter suffers large runtime overhead. In this work, we propose free launch, a new software approach to overcoming the shortcomings of both methods. It allows programmers to use subkernel launches to express dynamic parallelism. It employs a novel compiler-based code transformation named subkernel launch removal to replace the subkernel launches with the reuse of parent threads. Coupled with an adaptive task assignment mechanism, the transformation reassigns the tasks in the subkernels to the parent threads with a good load balance. The technique requires no hardware extensions, immediately deployable on existing GPUs. It keeps the programming convenience of the subkernel launch-based approach while avoiding its large runtime overhead. Meanwhile, its superior load balancing makes it outperform manual worklist-based techniques by 3X on average.

Tags: Compilers, Computer science, CUDA, nVidia, Tesla K20, Tesla K40

November 8, 2015 by hgpu

Rating: 4.3/5. From 5 votes.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Free Launch: Optimizing GPU Dynamic Kernel Launches through Thread Reuse

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Free Launch: Optimizing GPU Dynamic Kernel Launches through Thread Reuse

Share this:

Recent source codes

Most viewed papers (last 30 days)