high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Communication-Minimizing 2D Convolution in GPU Registers

Communication-Minimizing 2D Convolution in GPU Registers

Forrest N. Iandola, David Sheffield, Michael Anderson, Phitchaya Mangpo Phothilimthana, Kurt Keutzer

Parallel Computing Laboratory (ParLab), University of California, Berkeley, CA, USA

International Conference on Image Processing (ICIP), 2013

@article{iandola2013communication,

title={COMMUNICATION-MINIMIZING 2D CONVOLUTION IN GPU REGISTERS},

author={Iandola, Forrest N and Sheffield, David and Anderson, Michael and Phothilimthana, Phitchaya Mangpo and Keutzer, Kurt},

year={2013}

}

Download (PDF)

View

Source

2278

views

2D image convolution is ubiquitous in image processing and computer vision problems such as feature extraction. Exploiting parallelism is a common strategy for accelerating convolution. Parallel processors keep getting faster, but algorithms such as image convolution remain memory bounded on parallel processors such as GPUs. Therefore, reducing memory communication is fundamental to accelerating image convolution. To reduce memory communication, we reorganize the convolution algorithm to prefetch image regions to register, and we do more work per thread with fewer threads. To enable portability to future architectures, we implement a convolution autotuner that sweeps the design space of memory layouts and loop unrolling configurations. We focus on convolution with small filters (2×2-7×7), but our techniques can be extended to larger filter sizes. Depending on filter size, our speedups on two NVIDIA architectures range from 1.2x to 4.5x over state-of-the-art GPU libraries.

Tags: Algorithms, Computer vision, CUDA, Image processing, nVidia, nVidia GeForce GTX 680, Prefetch, Tesla C2050

April 17, 2013 by hgpu

Rating: 2.5/5. From 1 vote.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Communication-Minimizing 2D Convolution in GPU Registers

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Communication-Minimizing 2D Convolution in GPU Registers

Share this:

Recent source codes

Most viewed papers (last 30 days)