CAPRI: Prediction of Compaction-Adequacy for Handling Control-Divergence in GPGPU Architectures
Electrical and Computer Engineering Department, The University of Texas at Austin
International Symposium on Computer Architecture (ISCA’12), 2012
@inproceedings{rhu2012capri,
author={Minsoo Rhu and Mattan Erez},
title={CAPRI: Prediction of Compaction-Adequacy for Handling Control-Divergence in GPGPU Architectures},
booktitle={to appear in the International Symposium on Computer Architecture (ISCA’12)},
location={Portland, Oregon},
month={June},
year={2012},
pdf={http://lph.ece.utexas.edu/merez/uploads/MattanErez/capri_isca2012.pdf},
mycat={conference}
}
Wide SIMD-based GPUs have evolved into a promising platform for running general purpose workloads. Current programmable GPUs allow even code with irregular control to execute well on their SIMD pipelines. To do this, each SIMD lane is considered to execute a logical thread where hardware ensures that control flow is accurate by automatically applying masked execution. The masked execution, however, often degrades performance because the issue slots of masked lanes are wasted. This degradation can be mitigated by dynamically compacting multiple unmasked threads into a single SIMD unit. This paper proposes a fundamentally new approach to branch compaction that avoids the unnecessary synchronization required by previous techniques and that only stalls threads that are likely to benefit from compaction. Our technique is based on the compaction-adequacy predictor (CAPRI). CAPRI dynamically identifies the compactioneffectiveness of a branch and only stalls threads that are predicted to benefit from compaction. We utilize a simple single-level branch-predictor inspired structure and show that this simple configuration attains a prediction accuracy of 99.8% and 86.6% for non-divergent and divergent workloads, respectively. Our performance evaluation demonstrates that CAPRI consistently outperforms both the baseline design that never attempts compaction and prior work that stalls upon all divergent branches.
May 11, 2012 by hgpu