high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Large, Pruned or Continuous Space Language Models on a GPU for Statistical Machine Translation

Large, Pruned or Continuous Space Language Models on a GPU for Statistical Machine Translation

Holger Schwenk, Anthony Rousseau, Mohammed Attik

LIUM, University of Le Mans, 72085 Le Mans cedex 9, France

NAACL workshop on the Future of Language Modeling, 2012

@article{schwenk2012large,

title={Large, Pruned or Continuous Space Language Models on a GPU for Statistical Machine Translation},

author={Schwenk, H. and Rousseau, A. and Attik, M.},

journal={NAACL-HLT 2012},

pages={11},

year={2012}

}

Download (PDF)

View

Source

Source codes

Package:

CSLM: Continuous Space Language Model toolkit

1875

views

Language models play an important role in large vocabulary speech recognition and statistical machine translation systems. The dominant approach since several decades are back-off language models. Some years ago, there was a clear tendency to build huge language models trained on hundreds of billions of words. Lately, this tendency has changed and recent works concentrate on data selection. Continuous space methods are a very competitive approach, but they have a high computational complexity and are not yet in widespread use. This paper presents an experimental comparison of all these approaches on a large statistical machine translation task. We also describe an open-source implementation to train and use continuous space language models (CSLM) for such large tasks. We describe an efficient implementation of the CSLM using graphical processing units from Nvidia. By these means, we are able to train an CSLM on more than 500 million words in 20 hours. This CSLM provides an improvement of up to 1.8 BLEU points with respect to the best back-off language model that we were able to build.

Tags: Computational Complexity, Computer science, CUDA, nVidia, nVidia GeForce GTX 580, Package, Speech recognition, Tesla M2090

June 11, 2012 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Large, Pruned or Continuous Space Language Models on a GPU for Statistical Machine Translation

Package:

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

Large, Pruned or Continuous Space Language Models on a GPU for Statistical Machine Translation

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)