high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Concurrent kernel execution on Graphic Processing Units

Concurrent kernel execution on Graphic Processing Units

Adrien Cassagne, Aurelien George, Benjamin Lorendeau, Jean-Charles Papin, Antoine Rougier

Universite de Bordeaux I

Universite de Bordeaux I, 2013

BibTeX

Download (PDF)

View

Source

2681

views

General Purpose Graphic Processing Unit (GPGPU) are now used in high performance computing (HPC) for their massively parallel computing aspect and capabilities. Those devices integrate hundreds of computing unit (computing core). Usually, such a level of parallelism is used to solve simulation problems (heat transfer, …) because of the numerical representation of simulated environment (matrices). Those GPU can be programmed with specific programming languages like CUDA and OpenCL which provide a standard environment (C/C++ libraries). Programs executed on a GPU (also called kernels) are executed sequentially. However, in order to maximize the usage of GPU resources, some advanced features (developed by NVIDIA) allow programmers to execute severals kernels in parallel on the GPU. Unfortunately, concurrent kernels execution is only possible with CUDA on NVIDIA graphics cards. For other cards, OpenCL does not offer this functionality. That is why researchers from University of Virginia (USA) [2], tried to extend OpenCL standard by allowing execution of an "master kernel" which will launch other kernels. In fact, the "master kernel" is a mix of memory-bound and compute-bound kernels. By doing this, they could evaluate the advantage of this kind of solution. Another group of researchers (from University of George Washington and from University of Arkansas), designed a software environment that allows different threads from the same process to share access to the GPU, which wasn’t possible until the introduction of the "Automatic Context Funneling" [2] capabilities in CUDA 4.0. For our PER (Projet d’Etude et de Recherche), we will analyse the benefits and limitations of concurrent kernel execution. We will also determine if parallel kernel execution can be used to avoid the cost of data transfers from the host to the GPU (by starting long computing time kernel before starting data transfers).

Tags: Computer science, CUDA, nVidia, nVidia GeForce GTX 660, nVidia Quadro 4000, OpenCL, Performance, Thesis

October 21, 2013 by hgpu

No votes yet.

Please wait...

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Concurrent kernel execution on Graphic Processing Units

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)

Concurrent kernel execution on Graphic Processing Units

Share this:

Recent source codes

Most viewed papers (last 30 days)