Implementations of the Hough Transform on the Embedded Multicore Processors
Department of Information Engineering, Hiroshima University, Kagamiyama 1-4-1, Higashi-Hiroshima, Hiroshima, 739-8527 Japan
International Journal of Networking and Computing, Volume 4, Number 1, pages 174-188, 2014
@article{zhou2014implementations,
title={Implementations of the Hough Transform on the Embedded Multicore Processors},
author={Zhou, Xin and Tomagou, Norihiro and Ito, Yasuaki and Nakano, Koji},
journal={International Journal of Networking and Computing},
volume={4},
number={1},
pages={174–188},
year={2014}
}
Embedded multicore processors represented by FPGAs and GPUs have lately attracted considerable attention for their potential computation ability and power consumption. Recent FPGAs have hundreds of embedded DSP slices and block RAMs. For example, Xilinx Virtex-6 Family FPGAs have a DSP48E1 slice, which is a configurable logic block equipped with fast multipliers, adders, pipeline registers, and so on. They also have a dual-port memory with 18Kbits as a block RAM. Meanwhile, recent GPUs can be used for general purpose computation. Users can develop parallel programs running on GPUs using programming architecture called CUDA provided by NVIDIA. The main contribution of this paper is to present two implementations of the Hough transform on the FPGA and the GPU. The first idea of the implementations is an efficient usage of DSP slices and block RAMs for FPGAs, and the shared memory for GPUs. The second idea is to partition the voting space in the Hough transform and the voting operation is performed in parallel. The implementation results show that the Hough transform for a 512×512 image with 33232 edge points can be done in 135.75micros and 637.88micros on the FPGA and the GPU, respectively. On the other hand, a conventional CPU implementation runs in 37.10ms. Thus, both implementations achieve a sufficient speed-up.
January 11, 2014  by hgpu
Your response
You must be logged in to post a comment.




