Parallel Application Library for Object Recognition

Bor-Yiing Su
Electrical Engineering and Computer Sciences, University of California at Berkeley
University of California at Berkeley, Technical Report No. UCB/EECS-2012-199, 2012

   title={Parallel Application Library for Object Recognition},

   author={Su, B.Y.},



Download Download (PDF)   View View   Source Source   



Computer vision research enables machines to understand the world. Humans usually interpret and analyze the world through what they see – the objects they capture with their eyes. Similarly, machines can better understand the world by recognizing objects in images. Object recognition is therefore a major branch of computer vision. To achieve the highest accuracy, state-of-the-art object recognition systems must extract features from hundreds to millions of images, train models with enormous data samples, and deploy those models on query images. As a result, these systems are computationally-intensive. In order to make such complicated algorithms practical to apply in real life, we must accelerate them on modern massively-parallel platforms. However, parallel programming is complicated and challenging, and takes years to master. In order to help object recognition researchers employ parallel platforms more productively, we propose a parallel application library for object recognition. Researchers can simply call the library functions, and need not understand the technical details of parallelization and optimization. To pave the way for such a library, we perform pattern mining on 31 important object recognition systems, and conclude that 15 application patterns are necessary to cover the computations in these systems. In other words, if we support these 15 application patterns in our library, we can parallelize all 31 object recognition systems. In order to optimize any given application pattern in a systematic way, we propose using patterns and software architectures to explore the design space of algorithms, parallelization strategies, and platform parameters. In this dissertation, we exhaustively examine the design space for three application patterns, and achieve significant speedups on these patterns – 280x speedup on the eigensolver application pattern, 12-33x speedup on the breadth-first-search graph traversal application pattern, and 5-30x speedup on the contour histogram application pattern. To improve the portability and flexibility of the proposed library, we also initiate the OpenCL for OpenCV project. This project aims to provide a collection of autotuners that optimize the performance of application patterns on many different parallel platforms. We have developed two autotuners in this project. clSpMV is an autotuner for sparse matrix vector multiplication (SpMV) computation – it tunes the representation of a sparse matrix and the corresponding SpMV kernel, and is 40% faster than the vendor-optimized parallel implementation. clPaDi is an autotuner for the pair-wise distance computation application pattern – it allows users to customize their own distance functions, and finds the best blocking size for each function. clPaDi performs 320-650 giga floating point operations per second on modern GPU platforms. By employing these optimized functions in a state-of-the-art object recognition system, we have achieved 110-120x speedup compared to the original serial implementation. Now it takes only three seconds to identify objects in a query image – a much more practical and useful processing time. Our research makes it possible to deploy complicated object recognition algorithms in real applications. With these encouraging results, we are confident that the methodology we illustrate in this dissertation is applicable to optimizing all application patterns. If we expand the parallel application library to support all 15 application patterns, the library will be a key toolkit for both existing and future object recognition systems.
VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)

* * *

* * *

Follow us on Twitter

HGPU group

1658 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

335 people like HGPU on Facebook

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: