Characterization of Speech Recognition Systems on GPU Architectures

Albert Segura Salvador
Departament d’Arquitectura de Computadors, Universitat Politecnica de Catalunya
Universitat Politecnica de Catalunya, 2016


   title={Characterization of Speech Recognition Systems on GPU Architectures},

   author={Segura Salvador, Albert},


   publisher={Universitat Polit{‘e}cnica de Catalunya}


Download Download (PDF)   View View   Source Source   



Automatic speech recognition is one of the most important applications in the area of cognitive computing. Mobile devices, such as smartphones, have incorporated speech recognition as one of the main interfaces for user interaction. This trend towards voice-based user interfaces is likely to continue in the next years. Effective speech recognition systems require real-time recognition, which involves a huge effort for CPU architectures to reach it. GPU architectures offer parallelization capabilities which can be exploited to increase the performance of speech recognition systems. However, efficiently utilizing the GPU resources for speech recognition is challenging, as the software implementations exhibit irregular and unpredictable memory accesses and poor temporal locality. Our key ambition is to characterize the performance and energy bottlenecks of speech recognition systems when running on a modern GPU, with the aim of providing useful information for designing future GPU architectures. First, we develop a GPU version of the Viterbi search algorithm, which is known to be the main bottleneck by far in speech recognition systems. Second, we analyse the GPU architecture to find the main sources of stalls in the pipeline and the energy bottlenecks. We show that memory stalls are the main reason for the low utilization of GPU resources. We then focus on the exploration of a number of architectural modifications to state-of-theart GPU architectures in order to deal with the performance limiting factors, i.e. the memory bottlenecks, and propose a GPU configuration highly tuned for speech recognition. The exploration evaluates different parameters for the memory hierarchy, including the L1 data cache, the L2 cache and the memory controller. We also consider modifications to the core resources and frequency scaling, in order to significantly reduce the number of idle cycles waiting for the memory and the underutilization of functional units. Our proposed GPU configuration is able to achieve real-time performance for large-vocabulary speech recognition, while increasing the issue rate from 5.1% to 18.1%, and achieving a power reduction of 31.6%, an energy reduction of 24% and area shrinkage of 17.96%.
Rating: 2.5/5. From 1 vote.
Please wait...

Recent source codes

* * *

* * *

HGPU group © 2010-2023 hgpu.org

All rights belong to the respective authors

Contact us: