high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Language Modeling with Gated Convolutional Networks

Language Modeling with Gated Convolutional Networks

Yann N. Dauphin, Angela Fan, Michael Auli, David Grangier

Facebook AI Research

arXiv:1612.08083 [cs.CL], (23 Dec 2016)

BibTeX

Download (PDF)

View

Source

6055

views

The pre-dominant approach to language modeling to date is based on recurrent neural networks. In this paper we present a convolutional approach to language modeling. We introduce a novel gating mechanism that eases gradient propagation and which performs better than the LSTM-style gating of (Oord et al, 2016) despite being simpler. We achieve a new state of the art on WikiText-103 as well as a new best single-GPU result on the Google Billion Word benchmark. In settings where latency is important, our model achieves an order of magnitude speed-up compared to a recurrent baseline since computation can be parallelized over time. To our knowledge, this is the first time a non-recurrent approach outperforms strong recurrent models on these tasks.

Tags: Computer science, CUDA, Deep learning, NLP, nVidia, Performance, Tesla M40, Torch

December 26, 2016 by hgpu

Rating: 1.8/5. From 3 votes.

Please wait...

high performance computing on graphics processing units: hgpu.org

Language Modeling with Gated Convolutional Networks

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Language Modeling with Gated Convolutional Networks

Share this:

Recent source codes

Most viewed papers (last 30 days)