Multi-threaded Kernel Offloading to GPGPU Using Hyper-Q on Kepler Architecture
Zuse Institute Berlin, Takustrasse 7, D-14195 Berlin, Germany
Zuse Institute Berlin, Report (14-19), 2014
@article{wende2014multithreaded,
title={Multi-threaded Kernel Offloading to GPGPU Using Hyper-Q on Kepler Architecture},
author={Wende, Florian and Steinke, Thomas and Cordes, Frank},
year={2014}
}
Small-scale computations usually cannot fully utilize the compute capabilities of modern GPGPUs. With the Fermi GPU architecture Nvidia introduced the concurrent kernel execution feature allowing up to 16 GPU kernels to execute simultaneously on a shared GPU device for a better utilization of the respective resources. Insufficient scheduling capabilities in this respect, however, can significantly reduce the theoretical concurrency level. With the Kepler GPU architecture Nvidia addresses this issue by introducing the Hyper-Q feature with 32 hardware managed work queues for concurrent kernel execution. We investigate the Hyper-Q feature within heterogeneous workloads with multiple concurrent host threads or processes offloading computations to the GPU each. By means of a synthetic benchmark kernel and a hybrid parallel CPU-GPU real-world application, we evaluate the performance obtained with Hyper-Q on GPU and compare it against a kernel reordering mechanism introduced by the authors for the Fermi architecture.
June 12, 2014 by hgpu