high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks

SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks

Linnan Wang, Jinmian Ye, Yiyang Zhao, Wei Wu, Ang Li, Shuaiwen Leon Song, Zenglin Xu, Tim Kraska

Brown University

arXiv:1801.04380 [cs.DC], (13 Jan 2018)

DOI:10.1145/3178487.3178491

@article{wang2018superneurons,

title={SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks},

author={Wang, Linnan and Ye, Jinmian and Zhao, Yiyang and Wu, Wei and Li, Ang and Song, Shuaiwen Leon and Xu, Zenglin and Kraska, Tim},

year={2018},

month={jan},

archivePrefix={"arXiv"},

primaryClass={cs.DC},

doi={10.1145/3178487.3178491}

}

Download (PDF)

View

Source

2548

views

Going deeper and wider in neural architectures improves the accuracy, while the limited GPU DRAM places an undesired restriction on the network design domain. Deep Learning (DL) practitioners either need change to less desired network architectures, or nontrivially dissect a network across multiGPUs. These distract DL practitioners from concentrating on their original machine learning tasks. We present SuperNeurons: a dynamic GPU memory scheduling runtime to enable the network training far beyond the GPU DRAM capacity. SuperNeurons features 3 memory optimizations, Liveness Analysis, Unified Tensor Pool, and Cost-Aware Recomputation, all together they effectively reduce the network-wide peak memory usage down to the maximal memory usage among layers. We also address the performance issues in those memory saving techniques. Given the limited GPU DRAM, SuperNeurons not only provisions the necessary memory for the training, but also dynamically allocates the memory for convolution workspaces to achieve the high performance. Evaluations against Caffe, Torch, MXNet and TensorFlow have demonstrated that SuperNeurons trains at least 3.2432 deeper network than current ones with the leading performance. Particularly, SuperNeurons can train ResNet2500 that has $10^4$ basic network layers on a 12GB K40c.

Tags: Computer science, CUDA, Deep learning, Machine learning, Neural networks, nVidia, Tesla K40

January 20, 2018 by hgpu

Rating: 3.5/5. From 2 votes.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)