OpenCL-based optimizations for acceleration of object tracking on FPGAs and GPUs
ECE Department of Northeastern University
International Workshop on Architectures and Systems for Real-time Mobile Vision Applications (ASR-MOV), 2016
@conference{Momeni_ASRMOV_2016,
title={OpenCL-based optimizations for acceleration of object tracking on FPGAs and GPUs},
booktitle={International Workshop on Architectures and Systems for Real-time Mobile Vision Applications (ASR-MOV)},
year={2016},
month={03/2016},
address={Barcelona, Spain},
author={Amir Momeni and Hamed Tabkhi and Gunar Schirner and David R. Kaeli}
}
OpenCL support across many heterogeneous nodes (FPGAs, GPUs, CPUs) has increased the programmability of these systems significantly. At the same time, it opens up new challenges and design choices for system designers and application programmers. While OpenCL offers a universal semantic to capture the parallel behavior of applications independent of the target architecture, some customization should take place at the source-level to increase the efficiency of the target platform. In this paper, we study the impact of source-level optimizations on the overall execution time of OpenCL programs on heterogeneous systems. We focus on Meanshift Object Tracking (MSOT) algorithm as a highly challenging compute-intense vision kernel. We propose a new vertical classification for selecting the grain of parallelism for MSOT algorithm across two mainstream architecture classes (GPUs and FPGAs). Our results show that both finegrained and coarse-grained parallelism can greatly benefit GPU execution (up to a 6X speed-up), while the FPGA can only benefit from fine-grained parallelism (up to a 4X speed-up). However, the FPGA can largely benefit from executing both the parallel and serial parts of the program on the device (up to a 21X speed-up).
November 13, 2016 by hgpu