An Efficient Parallel Motion Estimation Algorithm and X264 Parallelization in CUDA
School of EECS, Seoul National University, Seoul, Korea
2011 Conference on Design and Architectures for Signal and Image Processing, 2011
@article{ko2011efficient,
title={AN EFFICIENT PARALLEL MOTION ESTIMATION ALGORITHM AND X264 PARALLELIZATION IN CUDA},
author={Ko, Y. and Yi, Y. and Ha, S.},
year={2011}
}
H.264/AVC video encoders have been widely used for its high coding efficiency. Since the computational demand proportional to the frame resolution is constantly increasing, it has been of great interest to accelerate H.264/AVC by parallel processing. Recently, graphics processing units (GPUs) have emerged as a viable target for accelerating general purpose applications by exploiting fine-grain data parallelisms. Despite extensive research effort to use GPUs to accelerate the H.264/AVC algorithm, it has not been successful to achieve any speed-up over the x264 algorithm that is known as the fastest CPU implementation because of significant communication overhead between the host CPU and the GPU and intra-frame dependency in the algorithm. In this paper, we propose a novel motion estimation (ME) algorithm tailored for NVIDIA GPU implementation. It is accompanied by a novel pipelining technique, called subframe ME processing, to effectively hide the communication overhead between the host CPU and the GPU. The proposed H.264 encoder achieves more than 20% speed-up compared with x264.
December 7, 2011 by hgpu