Improving GPU Performance via Large Warps and Two-Level Warp Scheduling
The University of Texas at Austin
44th International Symposium on Microarchitecture (MICRO), 2011
@article{narasiman2011improving,
title={Improving GPU Performance via Large Warps and Two-Level Warp Scheduling},
author={Narasiman, V. and Shebanow, M. and Lee, C.J. and Miftakhutdinov, R. and Mutlu, O. and Patt, Y.N.},
year={2011}
}
Due to their massive computational power, graphics processing units (GPUs) have become a popular platform for executing general purpose parallel applications. GPU programming models allow the programmer to create thousands of threads, each executing the same computing kernel. GPUs exploit this parallelism in two ways. First, threads are grouped into fixed-size SIMD batches known as warps, and second, many such warps are concurrently executed on a single GPU core. Despite these techniques, the computational resources on GPU cores are still underutilized, resulting in performance far short of what could be delivered. Two reasons for this are conditional branch instructions and stalls due to long latency operations. To improve GPU performance, computational resources must be more effectively utilized. To accomplish this, we propose two independent ideas: the large warp microarchitecture and two-level warp scheduling. We show that when combined, our mechanisms improve performance by 19.1% over traditional GPU cores for a wide variety of general purpose parallel applications that heretofore have not been able to fully exploit the available resources of the GPU chip.
November 11, 2011 by hgpu