high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » CUDA » Frame-based parallelization of MPEG-4 on compute unified device architecture (CUDA)

Frame-based parallelization of MPEG-4 on compute unified device architecture (CUDA)

Dishant Ailawadi, Milan Kumar Mohapatra, Ankush Mittal

Department of Electronics & Computer Engineering, Indian Institute of Technology Roorkee, India

IEEE 2nd International Advance Computing Conference (IACC), 2010, p.267-272

DOI:10.1109/IADCC.2010.5422997

BibTeX

Download (PDF)

View

Source

2052

views

Due to its object based nature, flexible features and provision for user interaction, MPEG-4 encoder is highly suitable for parallelization. The most critical and time-consuming operation of encoder is motion estimation. Nvidia’s general-purpose graphical processing unit (GPGPU) architecture allows for a massively parallel stream processor model at a very cheap price (in a few thousands Rupees). However synchronization of parallel calculations and repeated device to host data transfer is a major challenge in parallelizing motion estimation on CUDA. Our solution employs optimized and balanced parallelization of motion estimation on CUDA. This paper discusses about frame-based parallelization wherein parallelization is done at two levels – at macroblock level and at search range level. We propose a further division of macroblock to optimize parallelization. Our algorithm supports real-time processing and streaming for key applications such as e-learning, telemedicine and video-surveillance systems, as demonstrated by experimental results.

Tags: CUDA, Image processing, nVidia, nVidia Quadro FX 370, Video decoding

March 8, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Frame-based parallelization of MPEG-4 on compute unified device architecture (CUDA)

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

Frame-based parallelization of MPEG-4 on compute unified device architecture (CUDA)

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)