Concurrent kernel execution on Graphic Processing Units

hgpu.org » Applications » Computer science » Concurrent kernel execution on Graphic Processing Units

Concurrent kernel execution on Graphic Processing Units

Adrien Cassagne, Aurelien George, Benjamin Lorendeau, Jean-Charles Papin, Antoine Rougier

Universite de Bordeaux I

Universite de Bordeaux I, 2013

@article{cassagne2013concurrent,

title={Concurrent kernel execution on Graphic Processing Units},

author={Cassagne, Adrien and George, Aur{‘e}lien and Lorendeau, Benjamin and Papin, Jean-Charles and Rougier, Antoine},

year={2013}

}

Download (PDF)

View

Source

2875

views

General Purpose Graphic Processing Unit (GPGPU) are now used in high performance computing (HPC) for their massively parallel computing aspect and capabilities. Those devices integrate hundreds of computing unit (computing core). Usually, such a level of parallelism is used to solve simulation problems (heat transfer, …) because of the numerical representation of simulated environment (matrices). Those GPU can be programmed with specific programming languages like CUDA and OpenCL which provide a standard environment (C/C++ libraries). Programs executed on a GPU (also called kernels) are executed sequentially. However, in order to maximize the usage of GPU resources, some advanced features (developed by NVIDIA) allow programmers to execute severals kernels in parallel on the GPU. Unfortunately, concurrent kernels execution is only possible with CUDA on NVIDIA graphics cards. For other cards, OpenCL does not offer this functionality. That is why researchers from University of Virginia (USA) [2], tried to extend OpenCL standard by allowing execution of an "master kernel" which will launch other kernels. In fact, the "master kernel" is a mix of memory-bound and compute-bound kernels. By doing this, they could evaluate the advantage of this kind of solution. Another group of researchers (from University of George Washington and from University of Arkansas), designed a software environment that allows different threads from the same process to share access to the GPU, which wasn’t possible until the introduction of the "Automatic Context Funneling" [2] capabilities in CUDA 4.0. For our PER (Projet d’Etude et de Recherche), we will analyse the benefits and limitations of concurrent kernel execution. We will also determine if parallel kernel execution can be used to avoid the cost of data transfers from the host to the GPU (by starting long computing time kernel before starting data transfers).

Tags: Computer science, CUDA, nVidia, nVidia GeForce GTX 660, nVidia Quadro 4000, OpenCL, Performance, Thesis

October 21, 2013 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org