An Automatic Speech Recognition Application Framework for Highly Parallel Implementations on the GPU
Electrical Engineering and Computer Sciences, University of California at Berkeley
University of California, Berkeley, Technical Report No. UCB/EECS-2012-47, 2012
@article{chong2012automatic,
title={An Automatic Speech Recognition Application Framework for Highly Parallel Implementations on the GPU},
author={Chong, J. and Gonina, E. and Kolossa, D. and Zeiler, S. and Keutzer, K.},
year={2012}
}
Data layout, data placement, and synchronization processes are not usually part of a speech application expert’s daily concerns. Yet failure to carefully take these concerns into account in a highly parallel implementation on the graphics processing units (GPUs) could mean an order of magnitude of loss in application performance. In this paper we present an application framework for parallel programming of automatic speech recognition (ASR) applications that allows a speech application expert to effectively implement speech applications on the GPU. It is an approach for crystallizing and transferring the often tacit knowledge of effective parallel programming techniques while allowing for flexible adaptation to various application usage scenarios. The application framework for parallel programming includes an application context description, a software architecture, a reference implementation, and a set of extension points for flexible customization. We describe how a speech expert can use the application framework in a parallel application design flow as well as present two case studies that illustrate the flexibility of the framework to adapt to different usage scenarios. The case studies show two examples in extending the framework to an advanced audioonly speech recognition application and an audio-visual recognition application that enables lip-reading in high noise recognition environments. The adaptation to the latter scenario also demonstrates how the ASR application framework has enabled a Matlab/Java programmer to effectively utilize a GPU to produce an implementation that achieves a 20x speedup in recognition throughput as compared to a sequential CPU implementation.
May 15, 2012 by hgpu