high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Characterization of Speech Recognition Systems on GPU Architectures

Characterization of Speech Recognition Systems on GPU Architectures

Albert Segura Salvador

Departament d’Arquitectura de Computadors, Universitat Politecnica de Catalunya

Universitat Politecnica de Catalunya, 2016

@article{segura2016characterization,

title={Characterization of Speech Recognition Systems on GPU Architectures},

author={Segura Salvador, Albert},

year={2016},

publisher={Universitat Polit{‘e}cnica de Catalunya}

}

Download (PDF)

View

Source

2136

views

Automatic speech recognition is one of the most important applications in the area of cognitive computing. Mobile devices, such as smartphones, have incorporated speech recognition as one of the main interfaces for user interaction. This trend towards voice-based user interfaces is likely to continue in the next years. Effective speech recognition systems require real-time recognition, which involves a huge effort for CPU architectures to reach it. GPU architectures offer parallelization capabilities which can be exploited to increase the performance of speech recognition systems. However, efficiently utilizing the GPU resources for speech recognition is challenging, as the software implementations exhibit irregular and unpredictable memory accesses and poor temporal locality. Our key ambition is to characterize the performance and energy bottlenecks of speech recognition systems when running on a modern GPU, with the aim of providing useful information for designing future GPU architectures. First, we develop a GPU version of the Viterbi search algorithm, which is known to be the main bottleneck by far in speech recognition systems. Second, we analyse the GPU architecture to find the main sources of stalls in the pipeline and the energy bottlenecks. We show that memory stalls are the main reason for the low utilization of GPU resources. We then focus on the exploration of a number of architectural modifications to state-of-theart GPU architectures in order to deal with the performance limiting factors, i.e. the memory bottlenecks, and propose a GPU configuration highly tuned for speech recognition. The exploration evaluates different parameters for the memory hierarchy, including the L1 data cache, the L2 cache and the memory controller. We also consider modifications to the core resources and frequency scaling, in order to significantly reduce the number of idle cycles waiting for the memory and the underutilization of functional units. Our proposed GPU configuration is able to achieve real-time performance for large-vocabulary speech recognition, while increasing the issue rate from 5.1% to 18.1%, and achieving a power reduction of 31.6%, an energy reduction of 24% and area shrinkage of 17.96%.

Tags: Algorithms, Computer science, CUDA, Deep learning, GPGPU-sim, nVidia, nVidia GeForce GTX 980, Speech recognition, Thesis

September 22, 2016 by hgpu

Rating: 2.5/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

* * *

high performance computing on graphics processing units: hgpu.org

Characterization of Speech Recognition Systems on GPU Architectures

Your response

Recent source codes

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Vortex-Optimized Light-weight Toolchain (VOLT)

SciDef: Automated Definition Extraction from Scientific Literature

bioagent-bench: Benchmark for evaluating LLM agents in bioinformatics

Benchmark suite for LLM inference on NVIDIA consumer GPUs

Theorizer: from the paper Generating Literature-Driven Scientific Discoveries at Scale

Nsight Python: a Python kernel profiling interface based on NVIDIA Nsight Tools

Awesome LLM-Driven Kernel Generation

Most viewed papers (last 30 days)

Characterization of Speech Recognition Systems on GPU Architectures

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)