high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Single stream parallelization of generalized LSTM-like RNNs on a GPU

Single stream parallelization of generalized LSTM-like RNNs on a GPU

Kyuyeon Hwang, Wonyong Sung

Department of Electrical and Computer Engineering, Seoul National University, Seoul 151-744, South Korea

arXiv:1503.02852 [cs.NE], (10 Mar 2015)

@article{hwang2015single,

title={Single stream parallelization of generalized LSTM-like RNNs on a GPU},

author={Hwang, Kyuyeon and Sung, Wonyong},

year={2015},

month={mar},

archivePrefix={"arXiv"},

primaryClass={cs.NE}

}

Download (PDF)

View

Source

2101

views

Recurrent neural networks (RNNs) have shown outstanding performance on processing sequence data. However, they suffer from long training time, which demands parallel implementations of the training procedure. Parallelization of the training algorithms for RNNs are very challenging because internal recurrent paths form dependencies between two different time frames. In this paper, we first propose a generalized graph-based RNN structure that covers the most popular long short-term memory (LSTM) network. Then, we present a parallelization approach that automatically explores parallelisms of arbitrary RNNs by analyzing the graph structure. The experimental results show that the proposed approach shows great speed-up even with a single training stream, and further accelerates the training when combined with multiple parallel training streams.

Tags: Algorithms, Computer science, CUDA, Machine learning, Neural and Evolutionary Computing, Neural networks, nVidia, Tesla K40

March 22, 2015 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Single stream parallelization of generalized LSTM-like RNNs on a GPU

Recent source codes

Code examples for paper on SYCL backend of Kokkos - IWOCL 2024

ROCm's implementation of Gromacs

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Most viewed papers (last 30 days)

Single stream parallelization of generalized LSTM-like RNNs on a GPU

Share this:

Recent source codes

Most viewed papers (last 30 days)