Concurrent kernel execution on Graphic Processing Units

Adrien Cassagne, Aurelien George, Benjamin Lorendeau, Jean-Charles Papin, Antoine Rougier
Universite de Bordeaux I
Universite de Bordeaux I, 2013


   title={Concurrent kernel execution on Graphic Processing Units},

   author={Cassagne, Adrien and George, Aur{‘e}lien and Lorendeau, Benjamin and Papin, Jean-Charles and Rougier, Antoine},



Download Download (PDF)   View View   Source Source   



General Purpose Graphic Processing Unit (GPGPU) are now used in high performance computing (HPC) for their massively parallel computing aspect and capabilities. Those devices integrate hundreds of computing unit (computing core). Usually, such a level of parallelism is used to solve simulation problems (heat transfer, …) because of the numerical representation of simulated environment (matrices). Those GPU can be programmed with specific programming languages like CUDA and OpenCL which provide a standard environment (C/C++ libraries). Programs executed on a GPU (also called kernels) are executed sequentially. However, in order to maximize the usage of GPU resources, some advanced features (developed by NVIDIA) allow programmers to execute severals kernels in parallel on the GPU. Unfortunately, concurrent kernels execution is only possible with CUDA on NVIDIA graphics cards. For other cards, OpenCL does not offer this functionality. That is why researchers from University of Virginia (USA) [2], tried to extend OpenCL standard by allowing execution of an "master kernel" which will launch other kernels. In fact, the "master kernel" is a mix of memory-bound and compute-bound kernels. By doing this, they could evaluate the advantage of this kind of solution. Another group of researchers (from University of George Washington and from University of Arkansas), designed a software environment that allows different threads from the same process to share access to the GPU, which wasn’t possible until the introduction of the "Automatic Context Funneling" [2] capabilities in CUDA 4.0. For our PER (Projet d’Etude et de Recherche), we will analyse the benefits and limitations of concurrent kernel execution. We will also determine if parallel kernel execution can be used to avoid the cost of data transfers from the host to the GPU (by starting long computing time kernel before starting data transfers).
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: