high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Parallel implementation of Artificial Neural Network training for speech recognition

Parallel implementation of Artificial Neural Network training for speech recognition

Stefano Scanzio, Sandro Cumani, Roberto Gemello, Franco Mana, P. Laface

Dipartimento di Automatica e Informatica, Politecnico di Torino, Corso Duca degli Abruzzi 24, Torino 10129, Italy

Pattern Recognition Letters, Volume 31, Issue 11, 1 August 2010, Pages 1302-1309 (16 February 2010)

DOI:10.1016/j.patrec.2010.02.003

@article{scanzio2010parallel,

title={Parallel implementation of Artificial Neural Network training for speech recognition},

author={Scanzio, S. and Cumani, S. and Gemello, R. and Mana, F. and Laface, P.},

journal={Pattern Recognition Letters},

volume={31},

number={11},

pages={1302–1309},

issn={0167-8655},

year={2010},

publisher={Elsevier}

}

Source

1666

views

In this paper we describe the implementation of a complete ANN training procedure using the block mode back-propagation learning algorithm for sequential patterns – such as the observation feature vectors of a speech recognition system – exploiting the high performance SIMD architecture of GPU using CUDA and its C-like language interface. We also compare the speed-up obtained implementing the training procedure only taking advantage of the multi-thread capabilities of multi-core processors. In our implementation we take into account all the peculiar aspects of training large scale sequential patterns, in particular, the re-segmentation of the training sentences, the block size for the feed-forward and for the back-propagation steps, and the transfer of huge amount of data from host memory to the GPU card. Our approach has been tested by training acoustic models for large vocabulary speech recognition tasks, showing a 6 times reduction of the time required to train real-world large size networks with respect to an already optimized implementation using the Intel MKL libraries. Thanks to these optimizations and to the support of the GPU, the training time for language having a huge set of training sentences (about one million for Italian) can be reduced from approximately a month to 5 days.

Tags: Computer science, CUDA, Neural networks, nVidia

November 18, 2010 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Parallel implementation of Artificial Neural Network training for speech recognition

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Parallel implementation of Artificial Neural Network training for speech recognition

Share this:

Recent source codes

Most viewed papers (last 30 days)