Language Modeling with Gated Convolutional Networks
Facebook AI Research
arXiv:1612.08083 [cs.CL], (23 Dec 2016)
@article{dauphin2016language,
title={Language Modeling with Gated Convolutional Networks},
author={Dauphin, Yann N. and Fan, Angela and Auli, Michael and Grangier, David},
year={2016},
month={dec},
archivePrefix={"arXiv"},
primaryClass={cs.CL}
}
The pre-dominant approach to language modeling to date is based on recurrent neural networks. In this paper we present a convolutional approach to language modeling. We introduce a novel gating mechanism that eases gradient propagation and which performs better than the LSTM-style gating of (Oord et al, 2016) despite being simpler. We achieve a new state of the art on WikiText-103 as well as a new best single-GPU result on the Google Billion Word benchmark. In settings where latency is important, our model achieves an order of magnitude speed-up compared to a recurrent baseline since computation can be parallelized over time. To our knowledge, this is the first time a non-recurrent approach outperforms strong recurrent models on these tasks.
December 26, 2016 by hgpu