Efficient Parallel Intra-prediction Mode Selection Scheme for 4×4 Blocks in H.264
International Conference on Intelligent Computation Technology and Automation (ICICTA), 2011, pp. 527-530
An intra-prediction mode with 4×4 block and 16×16 block sizes for luma component and 8×8 block size for chroma component is used in H.264 to improve the rate-distortion performance. However, the computational complexity of H.264 encoder is drastically increased due to the various intraprediction modes. Recently efficient hardware architectures were proposed for the fast execution of H.264/AVC intraprediction mode selection. This paper proposes an efficient pipelining method for the 4×4 blocks intra-prediction mode selection. In particular, we exploit the GPU’s streaming architecture at 4 x 4 intra-prediction mode selection in H.264/AVC and we develop a special strategy including instruction optimization and taking full advantage of shared memory to further exploit the fine-grained parallelism of GPUs. Experimental results up to about 3xspeedup of our GPU-based algorithms over the implementations on sequential CPUs.
May 4, 2011 by hgpu