GPU-accelerated HMM for Speech Recognition

Leiming Yu, Yash Ukidave, David Kaeli
Department of Electrical and Computer Engineering, Northeastern University, Boston, MA, USA
Workshop Series on Heterogeneous and Unconventional Cluster Architectures and Applications (HUCAA), 2014


   title={GPU-accelerated HMM for Speech Recognition},

   author={Yu, Leiming and Ukidave, Yash and Kaeli, David},

   booktitle={Heterogeneous and Unconventional Cluster Architectures and Applications Workshop (HUCAA’14). IEEE},



Download Download (PDF)   View View   Source Source   Source codes Source codes




Speech recognition is used in a wide range of applications and devices such as mobile phones, in-car entertainment systems and web-based services. Hidden Markov Models (HMMs) is one of the most popular algorithmic approaches applied in speech recognition. Training and testing a HMM is computationally intensive and time-consuming. Running multiple applications concurrently with speech recognition could overwhelm the compute resources, and introduce unwanted delays in the speech processing, eventually dropping words in the process due to buffer overruns. Graphics processing units (GPUs) have become widely accepted as accelerators which offer massive amounts of parallelism. The host processor (the CPU) can offload compute-intensive portions of an application to the GPU, leaving the CPU to focus on serial tasks and scheduling operations. In this paper, we provide a parallelized Hidden Markov Model to accelerate isolated words speech recognition. We experiment with different optimization schemes and make use of optimized GPU computing libraries to speedup the computation on GPUs. We also explore the performance benefits of using advanced GPU features for concurrent execution of multiple compute kernels. The algorithms are evaluated on multiple Nvidia GPUs using CUDA as a programming framework. Our GPU implementation achieves better performance than traditional serial and multithreaded implementations. When considering the end-to-end performance of the application, which includes both data transfer and computation, we achieve a 9x speedup for training with the use of a GPU over a multi-threaded version optimized for a multi-core CPU.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: