Efficient GPU implementation of the integral histogram
Dept. of Computer Science, University of Missouri-Columbia, Columbia, Missouri Air Force Research Laboratory, Rome, NY 13441, USA
Workshop on Developer-Centred Computer Vision, LNCS ACCV, 2012
@inproceedings{poostchi2012efficient,
title={Efficient GPU implementation of the integral histogram},
author={Poostchi, Mahdieh and Palaniappan, Kannappan and Bunyak, Filiz and Becchi, Michela and Seetharaman, Guna},
booktitle={LNCS ACCV, Workshop on Developer-Centred Computer Vision},
year={2012}
}
The integral histogram for images is an efficient preprocessing method for speeding up diverse computer vision algorithms including object detection, appearance-based tracking, recognition and segmentation. Our proposed Graphics Processing Unit (GPU) implementation uses parallel prefix sums on row and column histograms in a cross-weave scan with high GPU utilization and communication-aware data transfer between CPU and GPU memories. Two different data structures and communication models were evaluated. A 3-D array to store binned histograms for each pixel and an equivalent linearized 1-D array, each with distinctive data movement patterns. Using the 3-D array with many kernel invocations and low workload per kernel was inefficient, highlighting the necessity for careful mapping of sequential algorithms onto the GPU. The reorganized 1-D array with a single data transfer to the GPU with high GPU utilization, was 60 times faster than the CPU version for a 1K x 1K image reaching 49 fr/sec and 21 times faster for 512 x 512 images reaching 194 fr/sec. The integral histogram module is applied as part of the likelihood of features tracking (LOFT) system for video object tracking using fusion of multiple cues.
March 21, 2013 by hgpu