high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A Scalable Multi-Path Microarchitecture for Efficient GPU Control Flow

A Scalable Multi-Path Microarchitecture for Efficient GPU Control Flow

Ahmed ElTantawy, Jessica Wenjie Ma, Mike O’Connor, Tor M. Aamodt

University of British Columbia

20th IEEE International Symposium on High-Performance Computer Architecture (HPCA-20), 2014

@article{eltantawy2014scalable,

title={A Scalable Multi-Path Microarchitecture for Efficient GPU Control Flow},

author={ElTantawy, Ahmed and Ma, Jessica Wenjie and O’Connor, Mike and Aamodt, Tor M},

year={2014}

}

Download (PDF)

View

Source

1869

views

Graphics processing units (GPUs) are increasingly used for non-graphics computing. However, applications with divergent control flow incur performance degradation on current GPUs. These GPUs implement the SIMT execution model by serializing the execution of different control flow paths encountered by a warp. This serialization can mask thread level parallelism among the scalar threads comprising a warp thus degrading performance. In this paper, we propose a novel branch divergence handling mechanism that enables interleaved execution of divergent paths within a warp while maintaining immediate postdominator reconvergence. This multi-path microarchitecture decouples divergence and reconvergence tracking by replacing the stack-based structure typically employed to support SIMT execution with two tables: a warp split table and a warp reconvergence table. It also enables reconvergence before the immediate postdominator which is important for efficient execution of unstructured control flow. Evaluated on a set of benchmarks with complex divergent control flow, our proposal achieves up to a 7x speedup with a harmonic mean of 32% over conventional single-path SIMT execution.

Tags: Computer science, Hardware Architecture, Performance

February 9, 2014 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

A Scalable Multi-Path Microarchitecture for Efficient GPU Control Flow

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

A Scalable Multi-Path Microarchitecture for Efficient GPU Control Flow

Share this:

Recent source codes

Most viewed papers (last 30 days)