10632

CUDA Enhanced Filtering in a Pipelined Video Processing Framework

Austin Aaron Dworaczyk Wiltshire
California Polytechnic State University, San Luis Obispo
California Polytechnic State University, 2013

@article{dworaczyk2013cuda,

   title={CUDA ENHANCED FILTERING IN A PIPELINED VIDEO PROCESSING FRAMEWORK},

   author={Dworaczyk Wiltshire, Austin Aaron},

   year={2013}

}

Download Download (PDF)   View View   Source Source   

1655

views

The processing of digital video has long been a significant computational task for modern x86 processors. With every video frame composed of one to three planes, each consisting of a two-dimensional array of pixel data, and a video clip comprising of thousands of such frames, the sheer volume of data is significant. With the introduction of new high definition video formats such as 4K or stereoscopic 3D, the volume of uncompressed frame data is growing ever larger. Modern CPUs offer performance enhancements for processing digital video through SIMD instructions such as SSE2 or AVX. However, even with these instruction sets, CPUs are limited by their inherently sequential design, and can only operate on a handful of bytes in parallel. Even processors with a multitude of cores only execute on an elementary level of parallelism. GPUs provide an alternative, massively parallel architecture. GPUs differ from CPUs by providing thousands of throughput-oriented cores, instead of a maximum of tens of generalized "good enough at everything" x86 cores. The GPU’s throughput-oriented cores are far more adept at handling large arrays of pixel data, as many video filtering operations can be performed independently. This computational independence allows for pixel processing to scale across hundreds or even thousands of device cores. This thesis explores the utilization of GPUs for video processing, and evaluates the advantages and caveats of porting the modern video filtering framework, Vapoursynth, over to running entirely on the GPU. Compute heavy GPU-enabled video processing results in up to a 108% speedup over an SSE2-optimized, multithreaded CPU implementation.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: