A Scalable Multi-Path Microarchitecture for Efficient GPU Control Flow

Ahmed ElTantawy, Jessica Wenjie Ma, Mike O’Connor, Tor M. Aamodt
University of British Columbia
20th IEEE International Symposium on High-Performance Computer Architecture (HPCA-20), 2014


   title={A Scalable Multi-Path Microarchitecture for Efficient GPU Control Flow},

   author={ElTantawy, Ahmed and Ma, Jessica Wenjie and O’Connor, Mike and Aamodt, Tor M},



Download Download (PDF)   View View   Source Source   



Graphics processing units (GPUs) are increasingly used for non-graphics computing. However, applications with divergent control flow incur performance degradation on current GPUs. These GPUs implement the SIMT execution model by serializing the execution of different control flow paths encountered by a warp. This serialization can mask thread level parallelism among the scalar threads comprising a warp thus degrading performance. In this paper, we propose a novel branch divergence handling mechanism that enables interleaved execution of divergent paths within a warp while maintaining immediate postdominator reconvergence. This multi-path microarchitecture decouples divergence and reconvergence tracking by replacing the stack-based structure typically employed to support SIMT execution with two tables: a warp split table and a warp reconvergence table. It also enables reconvergence before the immediate postdominator which is important for efficient execution of unstructured control flow. Evaluated on a set of benchmarks with complex divergent control flow, our proposal achieves up to a 7x speedup with a harmonic mean of 32% over conventional single-path SIMT execution.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: