Multi-Core Programming Design Patterns: Stream Processing Algorithms for Dynamic Scene Perceptions
University of Missouri, 316 University Hall, Columbia, MO 65211-3020
University of Missouri, 2014
@techreport{palaniappan2014multi,
title={Multi-Core Programming Design Patterns: Stream Processing Algorithms for Dynamic Scene Perceptions},
author={Palaniappan, Kannappan},
year={2014},
institution={DTIC Document}
}
We have implemented, tested, validated and benchmarked a scalable parallel implementations of the integral histogram algorithm critical for computer vision tasks for fast multiscale subwindow-based object searching, motion analysis and content-based image retrieval applications. Several integral histogram kernels using CUDA optimizations for many core GPUs were investigated. The integral histogram algorithm was also parallelized using the StarSs programming model in collaboration the Barcelona Supercomputing Center for several architectures including Cell/B.E., GPU and SMP. The Cell/B.E. implementation of the integral histogram using cross-weave scan and 16 bins for a 640×480 image reaches 160 fr/sec using 8 SPEs. The wavefront scan for the same sized image reaches almost 200 fr/sec but critically depends on the block size. The GPU implementation of the integral histogram was 60 times faster than the sequential CPU version for a 1K x 1K image reaching 49 fr/sec and 21 times faster for 512 x 512 images reaching 194 fr/sec. The implemented code has been delivered to AFRL for transition to other programs like CETE.
July 24, 2014 by hgpu