high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Deep Architectures for Neural Machine Translation

Deep Architectures for Neural Machine Translation

Antonio Valerio Miceli Barone, Jindrich Helcl, Rico Sennrich, Barry Haddow, Alexandra Birch

School of Informatics, University of Edinburgh

arXiv:1707.07631 [cs.CL], (24 Jul 2017)

@article{barone2017deep,

title={Deep Architectures for Neural Machine Translation},

author={Barone, Antonio Valerio Miceli and Helcl, Jindrich and Sennrich, Rico and Haddow, Barry and Birch, Alexandra},

year={2017},

month={jul},

archivePrefix={"arXiv"},

primaryClass={cs.CL}

}

Download (PDF)

View

Source

Source codes

Package:

NEMATUS: Open-Source Neural Machine Translation in Theano

2880

views

It has been shown that increasing model depth improves the quality of neural machine translation. However, different architectural variants to increase model depth have been proposed, and so far, there has been no thorough comparative study.
In this work, we describe and evaluate several existing approaches to introduce depth in neural machine translation. Additionally, we explore novel architectural variants, including deep transition RNNs, and we vary how attention is used in the deep decoder. We introduce a novel “BiDeep” RNN architecture that combines deep transition RNNs and stacked RNNs.
Our evaluation is carried out on the English to German WMT news translation dataset, using a single-GPU machine for both training and inference. We find that several of our proposed architectures improve upon existing approaches in terms of speed and translation quality. We obtain best improvements with a BiDeep RNN of combined depth 8, obtaining an average improvement of 1.5 BLEU over a strong shallow baseline.
We release our code for ease of adoption.

Tags: Computer science, CUDA, Deep learning, NLP, nVidia, nVidia GeForce GTX Titan X, Package, Theano

August 1, 2017 by hgpu

Rating: 2.8/5. From 2 votes.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Deep Architectures for Neural Machine Translation

Package:

Your response

Recent source codes

ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation

SeedFold: Scaling Biomolecular Structure Prediction

Tilus: A Tile-Level GPU Kernel Programming Language

Memory-Efficient Acceleration of Block Low-Rank Foundation Models on Resource Constrained GPUs

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

BoltzGen:Toward Universal Binder Design

cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution

MATLAB Tensor Core models

TritonForge: Transform PyTorch Operations into Optimized GPU Kernels with LLMs

RLTune: Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters

Most viewed papers (last 30 days)

Deep Architectures for Neural Machine Translation

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)