Exploiting concurrent kernel execution on graphic processing units
ECE Department, The George Washington University
International Conference on High Performance Computing and Simulation (HPCS), 2011
@inproceedings{wang2011exploiting,
title={Exploiting concurrent kernel execution on graphic processing units},
author={Wang, L. and Huang, M. and El-Ghazawi, T.},
booktitle={High Performance Computing and Simulation (HPCS), 2011 International Conference on},
pages={24–32},
year={2011},
organization={IEEE}
}
Graphics processing units (GPUs) have been accepted as a powerful and viable coprocessor solution in high-performance computing domain. In order to maximize the benefit of GPUs for a multicore platform, a mechanism is needed for CPU threads in a parallel application to share this computing resource for efficient execution. NVIDIA’s Fermi architecture pioneers the feature of concurrent kernel execution; however, only kernels of the same thread context can execute in parallel. In order to get the best use of a GPU device in a multi-threaded application environment, this paper explores the techniques to effectively share a context, i.e., context funneling, which could be done either manually at application level, or automatically at the GPU runtime starting from CUDA v4.0. For synthetic microbenchmark tests, we find that both funneling mechanisms are more capable of exploring the benefit of concurrent kernel execution than traditional context switching, therefore improving the overall application performance. We also find that the manual funneling mechanism provides the highest performance and more explicit control, while CUDA v4.0 provides better productivity with good performance. Finally, we assess the impact of such techniques on a compact application benchmark, SSCA#3 – SAR sensor processing.
November 19, 2011 by hgpu