21432

Character-level Transformer-based Neural Machine Translation

Nikolay Banar, Walter Daelemans, Mike Kestemont
University of Antwerp, Belgium
arXiv:2005.11239 [cs.CL], (22 May 2020)

@misc{banar2020characterlevel,

   title={Character-level Transformer-based Neural Machine Translation},

   author={Nikolay Banar and Walter Daelemans and Mike Kestemont},

   year={2020},

   eprint={2005.11239},

   archivePrefix={arXiv},

   primaryClass={cs.CL}

}

Download Download (PDF)   View View   Source Source   

982

views

Neural machine translation (NMT) is nowadays commonly applied at the subword level, using byte-pair encoding. A promising alternative approach focuses on character-level translation, which simplifies processing pipelines in NMT considerably. This approach, however, must consider relatively longer sequences, rendering the training process prohibitively expensive. In this paper, we discuss a novel, Transformer-based approach, that we compare, both in speed and in quality to the Transformer at subword and character levels, as well as previously developed character-level models. We evaluate our models on 4 language pairs from WMT’15: DE-EN, CS-EN, FI-EN and RU-EN. The proposed novel architecture can be trained on a single GPU and is 34% percent faster than the character-level Transformer; still, the obtained results are at least on par with it. In addition, our proposed model outperforms the subword-level model in FI-EN and shows close results in CS-EN. To stimulate further research in this area and close the gap with subword-level NMT, we make all our code and models publicly available.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: