H.264/AVC motion estimation implementation on Compute Unified Device Architecture (CUDA)
Department of Electronics Engineering, National Chiao-Tung University, Taiwan
Multimedia and Expo, 2008 IEEE International Conference on In Multimedia and Expo, 2008 IEEE International Conference on (2008), pp. 697-700
@conference{chen2008h,
title={H. 264/AVC motion estimation implmentation on compute unified device architecture (CUDA)},
author={Chen, W.N. and Hang, H.M.},
booktitle={Multimedia and Expo, 2008 IEEE International Conference on},
pages={697–700},
year={2008},
organization={IEEE}
}
Due to the rapid growth of graphics processing unit (GPU) processing capability, using GPU as a coprocessor to assist the central processing unit (CPU) in computing massive data becomes essential. In this paper, we present an efficient block-level parallel algorithm for the variable block size motion estimation (ME) in H.264/AVC with fractional pixel refinement on a computer unified device architecture (CUDA) platform, developed by NVIDIA in 2007. The CUDA enhances the programmability and flexibility for general-purpose computation on GPU. We decompose the H.264 ME algorithm into 5 steps so that we can achieve highly parallel computation with low external memory transfer rate. Experimental results show that, with the assistance of GPU, the processing time is 12 times faster than that of using CPU only.
December 11, 2010 by hgpu