Speech Recognition on Multi-Core Processors and GPUs

Patrick Cardinal
Ecole De Technologie Superieure, Universite Du Quebec
Universite Du Quebec, 2013


   title={Speech Recognition on Multi-Core Processors and GPUs},

   author={CARDINAL, Patrick},


   school={UNIVERSIT{‘E} DU QU{‘E}BEC}


Download Download (PDF)   View View   Source Source   



The speed of processors has remained stable over the past few years. The trend may even be towards slower speeds in order to satisfy the ever increasing demands of energy efficiency. This tendency is already apparent in the area of mobile devices. In order to take full advantage of the processing power offered by modern and future processors, applications must integrate parallelism and speech recognition is no exception. The classic decoding algorithm of Viterbi, a dynamic programming approach for searching in the recognition network, does not make full use of this power. The main reason being that the algorithm searches through a knowledge graph containing millions of nodes and transitions. In practice, a thorough search through such an enormous network is unfeasible. As a result, the graph is pruned so as to retain the most promising hypotheses only. The pruning process is however connected with a misuse of the memory architecture of Intel-based computers. To overcome this problem, another search algorithm is proposed: the A* search. This type of search makes use of a heuristic that provides an approximation of the distance for reaching the final node. A good heuristic results in a negligible number of nodes having to be explored, allowing to transfer the computational load of the network search towards the computation of the heuristic, so designed to make optimal use of modern processor architectures. The heuristic represents a much smaller knowledge graph for speech recognition. Because of its small size, the graph can be exhaustively explored thus eliminating the problems relating to memory architecture mismanagement. Acoustic model computations represent an important component of speech recognition. For this task, a 3.6x speed increase was achieved on a quad core processor with respect to the single core version. On GPU, the acceleration is 24.8x with respect to the sequential version. In regards to the recognition network search, the A* algorithm is shown to explore 28 times less nodes than the sequential version of the original algorithm. In addition, the heuristic computation is 4.1 and 10.1 times faster on a quad core and GPU than the sequential version respectively. Overall, the new parallelized version offers a 4% absolute increase in real-time recognition accuracy compared to the classic version.
Rating: 2.5/5. From 2 votes.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: