Multi-threaded Kernel Offloading to GPGPU Using Hyper-Q on Kepler Architecture

Florian Wende, Thomas Steinke, Frank Cordes
Zuse Institute Berlin, Takustrasse 7, D-14195 Berlin, Germany
Zuse Institute Berlin, Report (14-19), 2014


   title={Multi-threaded Kernel Offloading to GPGPU Using Hyper-Q on Kepler Architecture},

   author={Wende, Florian and Steinke, Thomas and Cordes, Frank},



Download Download (PDF)   View View   Source Source   



Small-scale computations usually cannot fully utilize the compute capabilities of modern GPGPUs. With the Fermi GPU architecture Nvidia introduced the concurrent kernel execution feature allowing up to 16 GPU kernels to execute simultaneously on a shared GPU device for a better utilization of the respective resources. Insufficient scheduling capabilities in this respect, however, can significantly reduce the theoretical concurrency level. With the Kepler GPU architecture Nvidia addresses this issue by introducing the Hyper-Q feature with 32 hardware managed work queues for concurrent kernel execution. We investigate the Hyper-Q feature within heterogeneous workloads with multiple concurrent host threads or processes offloading computations to the GPU each. By means of a synthetic benchmark kernel and a hybrid parallel CPU-GPU real-world application, we evaluate the performance obtained with Hyper-Q on GPU and compare it against a kernel reordering mechanism introduced by the authors for the Fermi architecture.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: