A CUDA Kernel Scheduler Exploiting Static Data Dependencies

Eva Burrows
Bergen Language Design Laboratory, Department of Informatics, University of Bergen, Norway
5th Workshop on Data-flow Execution Models for Extreme Scale Computing (Pact’15), 2015


   title={A CUDA Kernel Scheduler Exploiting Static Data Dependencies},

   author={Burrows, Eva},



Download Download (PDF)   View View   Source Source   



The CUDA execution model of Nvidia’s GPUs is based on the asynchronous execution of thread blocks, where each thread executes the same kernel in a data-parallel fashion. When threads in different thread blocks need to synchronise and communicate, the whole computation launched onto the GPU needs to be stopped and re-invoked in order to facilitate interblock synchronisations and communication. The need for synchronisation is tightly connected with the underlying data dependency pattern of the computation. For a good range of algorithms, the underlying data dependency pattern is static, scalable and shows some regularity. For instance, sorting networks, the Fast Fourier Transform, stencil computations of PDE solvers are such examples, but parallel design patterns like scan, reduce, and alike can also be considered. In such cases, much of the effort of devising and scheduling CUDA kernels for the computation can be automatized by exposing the dataflow representation of the computation in the program code using a dedicated API. We present a methodology to build a generic kernel scheduler and related kernel parameterised by this API. A computation formalised in the terms of this API then serves as the entry point to these generic computational mechanisms, leading to direct CUDA implementations.
Rating: 1.5/5. From 2 votes.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: