high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » CUDA » Paralleling Variable Block Size Motion Estimation of HEVC on Multi- Core CPU Plus GPU Platform

Paralleling Variable Block Size Motion Estimation of HEVC on Multi- Core CPU Plus GPU Platform

Xiang-wen Wang, Li Song, Min Chen, Jun-jie Yang

Shanghai University of electric power

2013 IEEE International Conference on Image Processing, 2013

BibTeX

Download (PDF)

View

Source

2757

views

Motion estimation with variable block sizes (VBSME) is one of the most complex models in the HEVC encoder. The HEVC standard supports up to 12 variable block sizes ranging from 4×8/8×4 to 64×64 for motion estimation (ME) and motion compensation (MC). This feature contributes substantial coding gain compared with 7 variable block sizes in H.264/AVC at the cost of huge computational complexity. The VBSME becomes the bottleneck for real time encoding. In this paper, we propose novel strategies for parallel acceleration the VBSME in HEVC encoder based on multi- core CPU plus many-core GPU platform. Firstly, a two- stage ME strategy is proposed for dividing ME task onto the CPU and the GPU. Then, a span-wavefront VBSME sequence is designed for efficient synchronization between the threads on the CPU and the threads on the GPU. Experimental results show that the speed of the HEVC encoder with the proposed strategies reaches about 28 fps for 1080P videos with a little compression performance degradation.

Tags: Compression, CUDA, H.264/AVC, Image processing, Motion compensation, nVidia, Tesla C2050

September 22, 2013 by hgpu

No votes yet.

Please wait...

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Paralleling Variable Block Size Motion Estimation of HEVC on Multi- Core CPU Plus GPU Platform

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)

Paralleling Variable Block Size Motion Estimation of HEVC on Multi- Core CPU Plus GPU Platform

Share this:

Recent source codes

Most viewed papers (last 30 days)