CAPRI: Prediction of Compaction-Adequacy for Handling Control-Divergence in GPGPU Architectures

hgpu.org » Applications » Computer science » CAPRI: Prediction of Compaction-Adequacy for Handling Control-Divergence in GPGPU Architectures

CAPRI: Prediction of Compaction-Adequacy for Handling Control-Divergence in GPGPU Architectures

Minsoo Rhu, Mattan Erez

Electrical and Computer Engineering Department, The University of Texas at Austin

International Symposium on Computer Architecture (ISCA’12), 2012

BibTeX

Download (PDF)

View

Source

2547

views

Wide SIMD-based GPUs have evolved into a promising platform for running general purpose workloads. Current programmable GPUs allow even code with irregular control to execute well on their SIMD pipelines. To do this, each SIMD lane is considered to execute a logical thread where hardware ensures that control flow is accurate by automatically applying masked execution. The masked execution, however, often degrades performance because the issue slots of masked lanes are wasted. This degradation can be mitigated by dynamically compacting multiple unmasked threads into a single SIMD unit. This paper proposes a fundamentally new approach to branch compaction that avoids the unnecessary synchronization required by previous techniques and that only stalls threads that are likely to benefit from compaction. Our technique is based on the compaction-adequacy predictor (CAPRI). CAPRI dynamically identifies the compactioneffectiveness of a branch and only stalls threads that are predicted to benefit from compaction. We utilize a simple single-level branch-predictor inspired structure and show that this simple configuration attains a prediction accuracy of 99.8% and 86.6% for non-divergent and divergent workloads, respectively. Our performance evaluation demonstrates that CAPRI consistently outperforms both the baseline design that never attempts compaction and prior work that stalls upon all divergent branches.

Tags: Computer science, CUDA, nVidia, nVidia Quadro FX 5800, Performance, Programming techniques

May 11, 2012 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

CAPRI: Prediction of Compaction-Adequacy for Handling Control-Divergence in GPGPU Architectures

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

CAPRI: Prediction of Compaction-Adequacy for Handling Control-Divergence in GPGPU Architectures

Share this:

Recent source codes

Most viewed papers (last 30 days)