Fast Spoken Query Detection Using Lower-Bound Dynamic Time Warping on Graphical Processing Units
MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts 02139, USA
International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5173-5176, 2012
@article{zhang2012fast,
title={Fast Spoken Query Detection Using Lower-Bound Dynamic Time Warping on Graphical Processing Units},
author={Zhang, Yaodong and Adl, Kiarash and Glass, James},
year={2012}
}
In this paper we present a fast unsupervised spoken term detection system based on lower-bound Dynamic Time Warping (DTW) search on Graphical Processing Units (GPUs). The lower-bound estimate and the K nearest neighbor DTW search are carefully designed to fit the GPU parallel computing architecture. In a spoken term detection task on the TIMIT corpus, a 55x speed-up is achieved compared to our previous implementation on a CPU without affecting detection performance. On large, artificially created corpora, measurements show that the total computation time of the entire spoken term detection system grows linearly with corpus size. On average, searching a keyword on a single desktop computer with modern GPUs requires 2.4 seconds/corpus hour.
April 6, 2012 by hgpu