CPU and GPU Co-processing for Sound

hgpu.org » Programming » Algorithms » CPU and GPU Co-processing for Sound

CPU and GPU Co-processing for Sound

Aleksander Gjermundsen

Department of Computer and Information Science, Faculty of Information Technology, Mathematics and Electrical Engineering, Norwegian University of Science and Technology

Norwegian University of Science and Technology, 2010

BibTeX

Download (PDF)

View

Source

Source codes

Package:

Speex

2471

views

When using voice communications, one of the problematic phenomena that can occur, is participants hearing an echo of their own voice. Acoustic echo cancellation (AEC) is used to remove this echo, but can be computationally demanding.The recent OpenCL standard allows high-level programs to be run on both multi-core CPUs, as well as Graphics Processing Units (GPUs) and custom accelerators. This opens up new possibilities for offloading computations, which is especially important for real-time applications. Although many algorithms for image- and video-processing have been studied on the GPU, audio processing algorithms have not similarly been well researched. This can be due to these algorithms not being viewed as computationally heavy and thus as suitable for GPU-offloading as, for instance, dense linear algebra.This thesis studies the AEC filter from the open-source library Speex for speech compression and audio preprocessing. We translate the original code into an optimized OpenCL program that can run on both CPUs and GPUs. Since the overhead of the OpenCL vendor implementations dominate running times, our results show that the existing reference implementation is faster for single channel input/output, due to its simplicity and low computational intensity. However, by increasing the number of channels processed by the filter and the length of the echo tail, a speed-up of up to 5 on CPU+GPU over CPU only, was achieved. Although these cases may not be the most common, the techniques developed in this thesis are expected to be of increasing importance as GPUs and CPUs become more integrated, especially on embedded devices. This makes latencies less of an issue and hence the value of our results stronger. An outline for future work in this area is thus also included.

Tags: Algorithms, ATI, ATI Radeon HD 5870, Computer science, Filtering, Linear Algebra, nVidia, nVidia GeForce GTX 280, OpenCL, Package, Signal processing, Tesla C2050, Thesis

October 30, 2011 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org