high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Accelerating recurrent neural network training using sequence bucketing and multi-GPU data parallelization

Accelerating recurrent neural network training using sequence bucketing and multi-GPU data parallelization

Viacheslav Khomenko, Oleg Shyshkov, Olga Radyvonenko, Kostiantyn Bokhan

Samsung RnD Institute Ukraine (SRK), 57, L’va Tolstogo Str., Kyiv, 01032, Ukraine

arXiv:1708.05604 [cs.LG], (18 Aug 2017)

DOI:10.1109/DSMP.2016.7583516

BibTeX

Download (PDF)

View

Source

3799

views

An efficient algorithm for recurrent neural network training is presented. The approach increases the training speed for tasks where a length of the input sequence may vary significantly. The proposed approach is based on the optimal batch bucketing by input sequence length and data parallelization on multiple graphical processing units. The baseline training performance without sequence bucketing is compared with the proposed solution for a different number of buckets. An example is given for the online handwriting recognition task using an LSTM recurrent neural network. The evaluation is performed in terms of the wall clock time, number of epochs, and validation loss value.

Tags: Algorithms, Computer science, Deep learning, LSTM, Neural networks, nVidia, Tesla K40, Theano

September 7, 2017 by hgpu

Rating: 1.0/5. From 1 vote.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Accelerating recurrent neural network training using sequence bucketing and multi-GPU data parallelization

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Most viewed papers (last 30 days)

Accelerating recurrent neural network training using sequence bucketing and multi-GPU data parallelization

Share this:

Recent source codes

Most viewed papers (last 30 days)