high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Speech Recognition on Multi-Core Processors and GPUs

Speech Recognition on Multi-Core Processors and GPUs

Patrick Cardinal

Ecole De Technologie Superieure, Universite Du Quebec

Universite Du Quebec, 2013

BibTeX

Download (PDF)

View

Source

2305

views

The speed of processors has remained stable over the past few years. The trend may even be towards slower speeds in order to satisfy the ever increasing demands of energy efficiency. This tendency is already apparent in the area of mobile devices. In order to take full advantage of the processing power offered by modern and future processors, applications must integrate parallelism and speech recognition is no exception. The classic decoding algorithm of Viterbi, a dynamic programming approach for searching in the recognition network, does not make full use of this power. The main reason being that the algorithm searches through a knowledge graph containing millions of nodes and transitions. In practice, a thorough search through such an enormous network is unfeasible. As a result, the graph is pruned so as to retain the most promising hypotheses only. The pruning process is however connected with a misuse of the memory architecture of Intel-based computers. To overcome this problem, another search algorithm is proposed: the A* search. This type of search makes use of a heuristic that provides an approximation of the distance for reaching the final node. A good heuristic results in a negligible number of nodes having to be explored, allowing to transfer the computational load of the network search towards the computation of the heuristic, so designed to make optimal use of modern processor architectures. The heuristic represents a much smaller knowledge graph for speech recognition. Because of its small size, the graph can be exhaustively explored thus eliminating the problems relating to memory architecture mismanagement. Acoustic model computations represent an important component of speech recognition. For this task, a 3.6x speed increase was achieved on a quad core processor with respect to the single core version. On GPU, the acceleration is 24.8x with respect to the sequential version. In regards to the recognition network search, the A* algorithm is shown to explore 28 times less nodes than the sequential version of the original algorithm. In addition, the heuristic computation is 4.1 and 10.1 times faster on a quad core and GPU than the sequential version respectively. Overall, the new parallelized version offers a 4% absolute increase in real-time recognition accuracy compared to the classic version.

Tags: Algorithms, CUDA, nVidia, nVidia GeForce GTX 295, Signal processing, Speech recognition, Thesis

August 13, 2013 by hgpu

Rating: 2.5/5. From 2 votes.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Speech Recognition on Multi-Core Processors and GPUs

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Speech Recognition on Multi-Core Processors and GPUs

Share this:

Recent source codes

Most viewed papers (last 30 days)