high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Accelerating Cost Aggregation for Real-Time Stereo Matching

Accelerating Cost Aggregation for Real-Time Stereo Matching

Jianbin Fang, Ana Lucia Varbanescu, Jie Shen, Henk Sips, Gorkem Saygili, Laurens van der Maaten

Parallel and Distributed Systems Group, Delft University of Technology, Delft, the Netherlands

IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS’12), 2012

BibTeX

Download (PDF)

View

Source

2595

views

Real-time stereo matching, which is important in many applications like self-driving cars and 3-D scene reconstruction, requires large computation capability and high memory bandwidth. The most time-consuming part of stereomatching algorithms is the aggregation of information (i.e. costs) over local image regions. In this paper, we present a generic representation and suitable implementations for three commonly used cost aggregators on many-core processors. We perform typical optimizations on the kernels, which leads to significant performance improvement (up to two orders of magnitude). Finally, we present a performance model for the three aggregators to predict the aggregation speed for a given pair of input images on a given architecture. Experimental results validate our model with an acceptable error margin (an average of 10.4%). We conclude that GPU-like many-cores are excellent platforms for accelerating stereo matching.

Tags: Algorithms, Image processing, nVidia, nVidia Quadro FX 5000, nVidia Quadro NVS 140 M, OpenCL, Optimization

October 13, 2012 by hgpu

No votes yet.

Please wait...

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Accelerating Cost Aggregation for Real-Time Stereo Matching

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)

Accelerating Cost Aggregation for Real-Time Stereo Matching

Share this:

Recent source codes

Most viewed papers (last 30 days)