Design and Development of an Efficient H. 264 Video Encoder for CPU/GPU using OpenCL

Shaikh Mohd. Laeeq, Gangadhar N. D., Brian Gee Chacko
Computer Engineering, M. S. Ramaiah School of Advanced Studies, Bangalore
SASTECH Journal, Volume 11, Issue 2, 2012

   title={Design and Development of an Efficient H. 264 Video Encoder for CPU/GPU using OpenCL},

   author={Laeeq, S.M. and Gangadhar, ND and Chacko, B.G.},



Download Download (PDF)   View View   Source Source   



Video codecs have undergone dramatic improvements and increased in complexity over the years owing to various commercial products like mobiles and Tablet PCs. With the emergence of standards, such H.264 which has emerged as the de facto standard for video, uniformity in the delivery of video is observed. With constraints of memory and transmission bandwidth, focus has been on the effective compression and decompression of video. Multicore architectures have increasingly becoming available on mobiles and Tablet PCs. As codecs have increased in complexity and become computationally intensive, it is all the more important to leverage such computation over multicore hardware architectures. OpenCL programming framework for programming multicore hardware architectures such as CPUs, GPUs and DSPs has grown to a high level of maturity. In this study an efficient H.264 video codec is developed using OpenCL for multicore architectures based on the x264 open source H.264 library. The x264 library is profiled using sample videos on a CPU and performance hotspots are identified for optimisation. These hotspots are optimized by means of encapsulation into the OpenCL kernel loops where 4 parallel threads are created by OpenMP. Further, compiler optimization flags and assembly instructions within the x264 library are used to improve memory efficiency and execution speed. Programs to identify and use the queried OpenCL CPU device and analyze the PCI bandwidth between the host and the device are developed. When launched over CPU and GPU platforms, with OpenCL API’s and multi threading, improvements in time of execution and the number of systems calls made are observed. The hotspot of x264_pixel_satd_8*4 resulted in 1.2 seconds gain as compared with earlier non OpenCL based optimization on CPU and 0.4 seconds gain for a GPU. The degradation in performance on a GPU platform is due to the read and write latencies. However, along with the use of compiler optimization flags and invoking assembly instructions in the entire x264 library resulted in a 4.3X improvement on a CPU and a 4.2X on a GPU platform. It can be concluded that, along with multithreading with OpenCL, the traditional approach of compiler level optimization is important as it deals with the core improvement in the application considered.
VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)

* * *

* * *

Like us on Facebook

HGPU group

128 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1191 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: AMD APP SDK 2.9
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 6.0.1, AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2014 hgpu.org

All rights belong to the respective authors

Contact us: