high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Massive Exploration of Neural Machine Translation Architectures

Massive Exploration of Neural Machine Translation Architectures

Denny Britz, Anna Goldie, Thang Luong, Quoc Le

Google Brain

arXiv:1703.03906 [cs.CL], (11 Mar 2017)

@article{britz2017massive,

title={Massive Exploration of Neural Machine Translation Architectures},

author={Britz, Denny and Goldie, Anna and Luong, Thang and Le, Quoc},

year={2017},

month={mar},

archivePrefix={"arXiv"},

primaryClass={cs.CL}

}

Download (PDF)

View

Source

Source codes

Package:

seq2seq: A general-purpose encoder-decoder framework for Tensorflow

2806

views

Neural Machine Translation (NMT) has shown remarkable progress over the past few years with production systems now being deployed to end-users. One major drawback of current architectures is that they are expensive to train, typically requiring days to weeks of GPU time to converge. This makes exhaustive hyperparameter search, as is commonly done with other neural network architectures, prohibitively expensive. In this work, we present the first large-scale analysis of NMT architecture hyperparameters. We report empirical results and variance numbers for several hundred experimental runs, corresponding to over 250,000 GPU hours on the standard WMT English to German translation task. Our experiments lead to novel insights and practical advice for building and extending NMT architectures. As part of this contribution, we release an open-source NMT framework that enables researchers to easily experiment with novel techniques and reproduce state of the art results.

Tags: Computer science, Deep learning, Neural networks, NLP, nVidia, Package, Python, TensorFlow, Tesla K40

March 14, 2017 by hgpu

Rating: 1.5/5. From 2 votes.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Massive Exploration of Neural Machine Translation Architectures

Package:

Your response

Recent source codes

Awesome LLM-Driven Kernel Generation

PhysProver: Advancing Automatic Theorem Proving for Physics

ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation

SeedFold: Scaling Biomolecular Structure Prediction

Tilus: A Tile-Level GPU Kernel Programming Language

Memory-Efficient Acceleration of Block Low-Rank Foundation Models on Resource Constrained GPUs

BoltzGen:Toward Universal Binder Design

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution

MATLAB Tensor Core models

Most viewed papers (last 30 days)

Massive Exploration of Neural Machine Translation Architectures

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)