high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A CUDA Kernel Scheduler Exploiting Static Data Dependencies

A CUDA Kernel Scheduler Exploiting Static Data Dependencies

Eva Burrows

Bergen Language Design Laboratory, Department of Informatics, University of Bergen, Norway

5th Workshop on Data-flow Execution Models for Extreme Scale Computing (Pact’15), 2015

@article{burrows2015cuda,

title={A CUDA Kernel Scheduler Exploiting Static Data Dependencies},

author={Burrows, Eva},

year={2015}

}

Download (PDF)

View

Source

1981

views

The CUDA execution model of Nvidia’s GPUs is based on the asynchronous execution of thread blocks, where each thread executes the same kernel in a data-parallel fashion. When threads in different thread blocks need to synchronise and communicate, the whole computation launched onto the GPU needs to be stopped and re-invoked in order to facilitate interblock synchronisations and communication. The need for synchronisation is tightly connected with the underlying data dependency pattern of the computation. For a good range of algorithms, the underlying data dependency pattern is static, scalable and shows some regularity. For instance, sorting networks, the Fast Fourier Transform, stencil computations of PDE solvers are such examples, but parallel design patterns like scan, reduce, and alike can also be considered. In such cases, much of the effort of devising and scheduling CUDA kernels for the computation can be automatized by exposing the dataflow representation of the computation in the program code using a dedicated API. We present a methodology to build a generic kernel scheduler and related kernel parameterised by this API. A computation formalised in the terms of this API then serves as the entry point to these generic computational mechanisms, leading to direct CUDA implementations.

Tags: Computer science, CUDA, nVidia, Task scheduler

December 15, 2015 by hgpu

Rating: 1.5/5. From 2 votes.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

A CUDA Kernel Scheduler Exploiting Static Data Dependencies

Your response

Recent source codes

Awesome LLM-Driven Kernel Generation

PhysProver: Advancing Automatic Theorem Proving for Physics

ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation

SeedFold: Scaling Biomolecular Structure Prediction

Tilus: A Tile-Level GPU Kernel Programming Language

Memory-Efficient Acceleration of Block Low-Rank Foundation Models on Resource Constrained GPUs

BoltzGen:Toward Universal Binder Design

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution

MATLAB Tensor Core models

Most viewed papers (last 30 days)

A CUDA Kernel Scheduler Exploiting Static Data Dependencies

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)