high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Branch and Data Herding: Reducing Control and Memory Divergence for Error-tolerant GPU Applications

Branch and Data Herding: Reducing Control and Memory Divergence for Error-tolerant GPU Applications

John Sartori, Rakesh Kumar

Department of Electrical and Computer Engineering, University of Illinois at UrbanaChampaign, Urbana, IL 61801

IEEE Transactions on Multimedia, TMM, 2012

@article{sartori2012branch,

title={Branch and Data Herding: Reducing Control and Memory Divergence for Error-tolerant GPU Applications},

author={Sartori, J. and Kumar, R.},

year={2012}

}

Download (PDF)

View

Source

1718

views

Control and memory divergence between threads within the same execution bundle, or warp, have been shown to cause significant performance bottlenecks for GPU applications. In this paper, we exploit the observation that many GPU applications exhibit error tolerance to propose branch and data herding. Branch herding eliminates control divergence by forcing all threads in a warp to take the same control path. Data herding eliminates memory divergence by forcing each thread in a warp to load from the same memory block. To safely and efficiently support branch and data herding, we propose a static analysis and compiler framework to prevent exceptions when control and data errors are introduced, a profiling framework that aims to maximize performance while maintaining acceptable output quality, and hardware optimizations to improve the performance benefits of exploiting error tolerance through branch and data herding. Our software implementation of branch herding on NVIDIA GeForce GTX 480 improves performance by up to 34% (13%, on average) for a suite of NVIDIA CUDA SDK and Parboil [16] benchmarks. Our hardware implementation of branch herding improves performance by up to 55% (30%, on average). Data herding improves performance by up to 32% (25%, on average). Observed output quality degradation is minimal for several applications that exhibit error tolerance, especially for visual computing applications.

Tags: Benchmarking, Computer science, CUDA, nVidia, nVidia GeForce GTX 480

June 17, 2012 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Branch and Data Herding: Reducing Control and Memory Divergence for Error-tolerant GPU Applications

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Branch and Data Herding: Reducing Control and Memory Divergence for Error-tolerant GPU Applications

Share this:

Recent source codes

Most viewed papers (last 30 days)