Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition
Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing, 100871
arXiv:1410.4281 [cs.CL], (16 Oct 2014)
@article{2014arXiv1410.4281L,
author={Li}, X. and {Wu}, X.},
title={"{Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition}"},
journal={ArXiv e-prints},
archivePrefix={"arXiv"},
eprint={1410.4281},
primaryClass={"cs.CL"},
keywords={Computer Science – Computation and Language, Computer Science – Neural and Evolutionary Computing},
year={2014},
month={oct},
adsurl={http://adsabs.harvard.edu/abs/2014arXiv1410.4281L},
adsnote={Provided by the SAO/NASA Astrophysics Data System}
}
Long short-term memory (LSTM) based acoustic modeling methods have recently been shown to give state-of-the-art performance on some speech recognition tasks. To achieve a further performance improvement, in this research, deep extensions on LSTM are investigated considering that deep hierarchical model has turned out to be more efficient than a shallow one. Motivated by previous research on constructing deep recurrent neural networks (RNNs), alternative deep LSTM architectures are proposed and empirically evaluated on a large vocabulary conversational telephone speech recognition task. Meanwhile, regarding to multi-GPU devices, the training process for LSTM networks is introduced and discussed. Experimental results demonstrate that the deep LSTM networks benefit from the depth and yield the state-of-the-art performance on this task.
October 18, 2014 by hgpu