high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Fast Speaker Diarization Using a High-Level Scripting Language

Fast Speaker Diarization Using a High-Level Scripting Language

Ekaterina Gonina, Gerald Friedland, Henry Cook, Kurt Keutzer

University of California, Berkeley

Automatic Speech Recognition and Understanding Workshop, 2011

@article{gonina2011fast,

title={Fast Speaker Diarization Using a High-Level Scripting Language},

author={Gonina, E. and Friedland, G. and Cook, H. and Keutzer, K.},

year={2011}

}

Download (PDF)

View

Source

Package:

Speaker Diarization

3316

views

Most current speaker diarization systems use agglomerative clustering of Gaussian Mixture Models (GMMs) to determine "who spoke when" in an audio recording. While stateof-the-art in accuracy, this method is computationally costly, mostly due to the GMM training, and thus limits the performance of current approaches to be roughly real-time. Increased sizes of current datasets require processing of hundreds of hours of data and thus make more efficient processing methods highly desirable. With the emergence of highly parallel multicore and manycore processors, such as graphics processing units (GPUs), one can re-implement GMM training to achieve faster than real-time performance by taking advantage of parallelism in the training computation. However, developing and maintaining the complex low-level GPU code is difficult and requires a deep understanding of the hardware architecture of the parallel processor. Furthermore, such low-level implementations are not readily reusable in other applications and not portable to other platforms, limiting programmer productivity. In this paper we present a speaker diarization system captured in under 50 lines of Python that achieves 50-250x faster than real-time performance by using a specialization framework to automatically map and execute computationally intensive GMM training on an NVIDIA GPU, without significant loss in accuracy.

Tags: Clustering, Computer science, CUDA, nVidia, nVidia GeForce GTX 480, Python, Speech recognition

October 31, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

high performance computing on graphics processing units: hgpu.org

Fast Speaker Diarization Using a High-Level Scripting Language

Package:

Your response

Recent source codes

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Vortex-Optimized Light-weight Toolchain (VOLT)

SciDef: Automated Definition Extraction from Scientific Literature

bioagent-bench: Benchmark for evaluating LLM agents in bioinformatics

Benchmark suite for LLM inference on NVIDIA consumer GPUs

Theorizer: from the paper Generating Literature-Driven Scientific Discoveries at Scale

Nsight Python: a Python kernel profiling interface based on NVIDIA Nsight Tools

Awesome LLM-Driven Kernel Generation

PhysProver: Advancing Automatic Theorem Proving for Physics

Most viewed papers (last 30 days)

Fast Speaker Diarization Using a High-Level Scripting Language

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)