high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Efficient Incremental Text-to-Speech on GPUs

Efficient Incremental Text-to-Speech on GPUs

Muyang Du, Chuan Liu, Jiaxing Qi, Junjie Lai

NVIDIA Corporation

arXiv:2211.13939 [cs.SD], (25 Nov 2022)

DOI:/10.48550/arXiv.2211.13939

BibTeX

Download (PDF)

View

Source

977

views

Incremental text-to-speech, also known as streaming TTS, has been increasingly applied to online speech applications that require ultra-low response latency to provide an optimal user experience. However, most of the existing speech synthesis pipelines deployed on GPU are still non-incremental, which uncovers limitations in high-concurrency scenarios, especially when the pipeline is built with end-to-end neural network models. To address this issue, we present a highly efficient approach to perform real-time incremental TTS on GPUs with Instant Request Pooling and Module-wise Dynamic Batching. Experimental results demonstrate that the proposed method is capable of producing high-quality speech with a first-chunk latency lower than 80ms under 100 QPS on a single NVIDIA A10 GPU and significantly outperforms the non-incremental twin in both concurrency and latency. Our work reveals the effectiveness of high-performance incremental TTS on GPUs.

Tags: Computer science, Deep learning, Machine learning, nVidia, nVidia A10, Tesla V100

December 4, 2022 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Efficient Incremental Text-to-Speech on GPUs

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

Efficient Incremental Text-to-Speech on GPUs

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)