Multi-Lingual Speech Recognition with Low-Rank Multi-Task Deep Neural Networks
Department of Electrical and Computer Engineering, McGill University, Montreal, Canada
IEEE International Conference on Acoustics, Speech and Signal Processing, 2015
@article{mohan2015multi,
title={Multi-Lingual Speech Recognition with Low-Rank Multi-Task Deep Neural Networks},
author={Mohan, Aanchan and Rose, Richard},
year={2015}
}
Multi-task learning (MTL) for deep neural network (DNN) multilingual acoustic models has been shown to be effective for learning parameters that are common or shared between multiple languages[1, 2]. In the MTL paradigm, the number of parameters in the output layer is large and scales with the number of languages used in training. This output layer becomes a computational bottleneck. For mono-lingual DNNs, low-rank matrix factorization (LRMF) of weight matrices have yielded large computational savings[3, 4]. The LRMF proposed in this work for MTL, is for the original language-specific block matrices to "share" a common matrix, with resulting low-rank language specific block matrices. The impact of LRMF is presented in two scenarios, namely: (a) improving performance in a target language when auxiliary languages are included during multi-lingual training; and (b) cross-language transfer to an unseen language with only 1 hour of transcribed training data. A 44% parameter reduction in the final layer, manifests itself in providing a lower memory footprint and faster training times. An experimental study shows that the LRMF multi-lingual DNN provides competitive performance compared to a full-rank multi-lingual DNN in both scenarios.
April 7, 2015 by hgpu