Efficient Parallel Intra-prediction Mode Selection Scheme for 4×4 Blocks in H.264
International Conference on Intelligent Computation Technology and Automation (ICICTA), 2011, pp. 527-530
@conference{jiao2011efficient,
title={Efficient Parallel Intra-prediction Mode Selection Scheme for 4×4 Blocks in H. 264},
author={Jiao, L. and Zhou, J. and Chen, R.},
booktitle={Intelligent Computation Technology and Automation (ICICTA), 2011 International Conference on},
volume={2},
pages={527–530},
organization={IEEE}
}
An intra-prediction mode with 4×4 block and 16×16 block sizes for luma component and 8×8 block size for chroma component is used in H.264 to improve the rate-distortion performance. However, the computational complexity of H.264 encoder is drastically increased due to the various intraprediction modes. Recently efficient hardware architectures were proposed for the fast execution of H.264/AVC intraprediction mode selection. This paper proposes an efficient pipelining method for the 4×4 blocks intra-prediction mode selection. In particular, we exploit the GPU’s streaming architecture at 4 x 4 intra-prediction mode selection in H.264/AVC and we develop a special strategy including instruction optimization and taking full advantage of shared memory to further exploit the fine-grained parallelism of GPUs. Experimental results up to about 3xspeedup of our GPU-based algorithms over the implementations on sequential CPUs.
May 4, 2011 by hgpu