high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Learning Sparse Recurrent Neural Networks in Language Modeling

Learning Sparse Recurrent Neural Networks in Language Modeling

Yuanlong Shao

The Ohio State University

The Ohio State University, 2014

BibTeX

Download (PDF)

View

Source

2221

views

In the context of statistical language modeling, we explored the task of learning an Elman network with sparse weight matrices, as a pilot study towards learning a sparsely connected fully recurrent neural network, which would be potentially useful in many cases. We also explored how efficient and scalable it can be in practice. In particular, we explored these tasks: (1) We adapted the Iterative Hard Thresholding (IHT) algorithm into the BackPropagation Through Time (BPTT) learning. (2) To accelerate convergence of the IHT algorithm, we designed a scheme for expanding the net-work by replicating the existing hidden neurons. Thus we can start training from a small and dense network which is already learned. (3) We implemented this algorithm in GPU. Under small minibatch sizes and large network sizes (e.g., 2000 hidden neurons) it achieves 160 times speedup compared to the RNNLM toolkit in CPU. With larger mini-batch sizes there could be another 10 times speedup, though the convergence rate be-comes an issue in such cases and further effort is needed to address this problem. (4) Without theoretical convergence guarantee of the IHT algorithm in our problem setting, we did an empirical study showing that learning a sparse network does give competitive perplexity in language modeling. In particular, we showed that a sparse network learned in this way can outperform a dense network when the number of effective parameters is kept the same. (5) We gathered performance metric comparing the computational efficiency of the matrix operations of interest in both sparse and dense settings. The results suggest that for network sizes which we can train in reasonable time at this moment, sparse matrices do not computational advantage than dense matrices, unless we are al-lowed to have very sparse networks. Thus for research purposes we may want to focus on using dense matrices, while for engineering purposes a more flexible matrix design lever-aging the power of dense and sparse matrices might be necessary.

Tags: Computer science, CUDA, Neural networks, nVidia, nVidia GeForce GTX Titan, Thesis

May 7, 2014 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Learning Sparse Recurrent Neural Networks in Language Modeling

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Learning Sparse Recurrent Neural Networks in Language Modeling

Share this:

Recent source codes

Most viewed papers (last 30 days)