high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A Scalable Multi-Path Microarchitecture for Efficient GPU Control Flow

A Scalable Multi-Path Microarchitecture for Efficient GPU Control Flow

Ahmed ElTantawy, Jessica Wenjie Ma, Mike O’Connor, Tor M. Aamodt

University of British Columbia

20th IEEE International Symposium on High-Performance Computer Architecture (HPCA-20), 2014

@article{eltantawy2014scalable,

title={A Scalable Multi-Path Microarchitecture for Efficient GPU Control Flow},

author={ElTantawy, Ahmed and Ma, Jessica Wenjie and O’Connor, Mike and Aamodt, Tor M},

year={2014}

}

Download (PDF)

View

Source

2483

views

Graphics processing units (GPUs) are increasingly used for non-graphics computing. However, applications with divergent control flow incur performance degradation on current GPUs. These GPUs implement the SIMT execution model by serializing the execution of different control flow paths encountered by a warp. This serialization can mask thread level parallelism among the scalar threads comprising a warp thus degrading performance. In this paper, we propose a novel branch divergence handling mechanism that enables interleaved execution of divergent paths within a warp while maintaining immediate postdominator reconvergence. This multi-path microarchitecture decouples divergence and reconvergence tracking by replacing the stack-based structure typically employed to support SIMT execution with two tables: a warp split table and a warp reconvergence table. It also enables reconvergence before the immediate postdominator which is important for efficient execution of unstructured control flow. Evaluated on a set of benchmarks with complex divergent control flow, our proposal achieves up to a 7x speedup with a harmonic mean of 32% over conventional single-path SIMT execution.

Tags: Computer science, Hardware Architecture, Performance

February 9, 2014 by hgpu

No votes yet.

Please wait...