A Scalable Multi-Path Microarchitecture for Efficient GPU Control Flow
University of British Columbia
20th IEEE International Symposium on High-Performance Computer Architecture (HPCA-20), 2014
@article{eltantawy2014scalable,
title={A Scalable Multi-Path Microarchitecture for Efficient GPU Control Flow},
author={ElTantawy, Ahmed and Ma, Jessica Wenjie and O’Connor, Mike and Aamodt, Tor M},
year={2014}
}
Graphics processing units (GPUs) are increasingly used for non-graphics computing. However, applications with divergent control flow incur performance degradation on current GPUs. These GPUs implement the SIMT execution model by serializing the execution of different control flow paths encountered by a warp. This serialization can mask thread level parallelism among the scalar threads comprising a warp thus degrading performance. In this paper, we propose a novel branch divergence handling mechanism that enables interleaved execution of divergent paths within a warp while maintaining immediate postdominator reconvergence. This multi-path microarchitecture decouples divergence and reconvergence tracking by replacing the stack-based structure typically employed to support SIMT execution with two tables: a warp split table and a warp reconvergence table. It also enables reconvergence before the immediate postdominator which is important for efficient execution of unstructured control flow. Evaluated on a set of benchmarks with complex divergent control flow, our proposal achieves up to a 7x speedup with a harmonic mean of 32% over conventional single-path SIMT execution.
February 9, 2014 by hgpu