high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » CUDA » CUDA Enhanced Filtering in a Pipelined Video Processing Framework

CUDA Enhanced Filtering in a Pipelined Video Processing Framework

Austin Aaron Dworaczyk Wiltshire

California Polytechnic State University, San Luis Obispo

California Polytechnic State University, 2013

@article{dworaczyk2013cuda,

title={CUDA ENHANCED FILTERING IN A PIPELINED VIDEO PROCESSING FRAMEWORK},

author={Dworaczyk Wiltshire, Austin Aaron},

year={2013}

}

Download (PDF)

View

Source

2212

views

The processing of digital video has long been a significant computational task for modern x86 processors. With every video frame composed of one to three planes, each consisting of a two-dimensional array of pixel data, and a video clip comprising of thousands of such frames, the sheer volume of data is significant. With the introduction of new high definition video formats such as 4K or stereoscopic 3D, the volume of uncompressed frame data is growing ever larger. Modern CPUs offer performance enhancements for processing digital video through SIMD instructions such as SSE2 or AVX. However, even with these instruction sets, CPUs are limited by their inherently sequential design, and can only operate on a handful of bytes in parallel. Even processors with a multitude of cores only execute on an elementary level of parallelism. GPUs provide an alternative, massively parallel architecture. GPUs differ from CPUs by providing thousands of throughput-oriented cores, instead of a maximum of tens of generalized "good enough at everything" x86 cores. The GPU’s throughput-oriented cores are far more adept at handling large arrays of pixel data, as many video filtering operations can be performed independently. This computational independence allows for pixel processing to scale across hundreds or even thousands of device cores. This thesis explores the utilization of GPUs for video processing, and evaluates the advantages and caveats of porting the modern video filtering framework, Vapoursynth, over to running entirely on the GPU. Compute heavy GPU-enabled video processing results in up to a 108% speedup over an SSE2-optimized, multithreaded CPU implementation.

Tags: CUDA, Filtering, Image processing, nVidia, nVidia GeForce GTX 780, nVidia GeForce GTX Titan, Thesis

October 2, 2013 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

* * *

high performance computing on graphics processing units: hgpu.org

CUDA Enhanced Filtering in a Pipelined Video Processing Framework

Your response

Recent source codes

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Vortex-Optimized Light-weight Toolchain (VOLT)

SciDef: Automated Definition Extraction from Scientific Literature

bioagent-bench: Benchmark for evaluating LLM agents in bioinformatics

Benchmark suite for LLM inference on NVIDIA consumer GPUs

Theorizer: from the paper Generating Literature-Driven Scientific Discoveries at Scale

Nsight Python: a Python kernel profiling interface based on NVIDIA Nsight Tools

Awesome LLM-Driven Kernel Generation

Most viewed papers (last 30 days)

CUDA Enhanced Filtering in a Pipelined Video Processing Framework

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)