high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A CUDA Kernel Scheduler Exploiting Static Data Dependencies

A CUDA Kernel Scheduler Exploiting Static Data Dependencies

Eva Burrows

Bergen Language Design Laboratory, Department of Informatics, University of Bergen, Norway

5th Workshop on Data-flow Execution Models for Extreme Scale Computing (Pact’15), 2015

@article{burrows2015cuda,

title={A CUDA Kernel Scheduler Exploiting Static Data Dependencies},

author={Burrows, Eva},

year={2015}

}

Download (PDF)

View

Source

1406

views

The CUDA execution model of Nvidia’s GPUs is based on the asynchronous execution of thread blocks, where each thread executes the same kernel in a data-parallel fashion. When threads in different thread blocks need to synchronise and communicate, the whole computation launched onto the GPU needs to be stopped and re-invoked in order to facilitate interblock synchronisations and communication. The need for synchronisation is tightly connected with the underlying data dependency pattern of the computation. For a good range of algorithms, the underlying data dependency pattern is static, scalable and shows some regularity. For instance, sorting networks, the Fast Fourier Transform, stencil computations of PDE solvers are such examples, but parallel design patterns like scan, reduce, and alike can also be considered. In such cases, much of the effort of devising and scheduling CUDA kernels for the computation can be automatized by exposing the dataflow representation of the computation in the program code using a dedicated API. We present a methodology to build a generic kernel scheduler and related kernel parameterised by this API. A computation formalised in the terms of this API then serves as the entry point to these generic computational mechanisms, leading to direct CUDA implementations.

Tags: Computer science, CUDA, nVidia, Task scheduler

December 15, 2015 by hgpu

Rating: 1.5/5. From 2 votes.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

A CUDA Kernel Scheduler Exploiting Static Data Dependencies

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

A CUDA Kernel Scheduler Exploiting Static Data Dependencies

Share this:

Recent source codes

Most viewed papers (last 30 days)