high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Memory Optimization for Deep Networks

Memory Optimization for Deep Networks

Aashaka Shah, Chao-Yuan Wu, Jayashree Mohan, Vijay Chidambaram, Philipp Krähenbühl

Department of Computer Science, University of Texas at Austi

arXiv:2010.14501 [cs.LG], (29 Oct 2020)

@misc{shah2020memory,

title={Memory Optimization for Deep Networks},

author={Aashaka Shah and Chao-Yuan Wu and Jayashree Mohan and Vijay Chidambaram and Philipp Krähenbühl},

year={2020},

eprint={2010.14501},

archivePrefix={arXiv},

primaryClass={cs.LG}

}

Download (PDF)

View

Source

Source codes

Package:

MONeT: Memory Optimization for Deep Networks

2124

views

Deep learning is slowly, but steadily, hitting a memory bottleneck. While the tensor computation in top-of-the-line GPUs increased by 32x over the last five years, the total available memory only grew by 2.5x. This prevents researchers from exploring larger architectures, as training large networks requires more memory for storing intermediate outputs. In this paper, we present MONeT, an automatic framework that minimizes both the memory footprint and computational overhead of deep networks. MONeT jointly optimizes the checkpointing schedule and the implementation of various operators. MONeT is able to outperform all prior hand-tuned operations as well as automated checkpointing. MONeT reduces the overall memory requirement by 3x for various PyTorch models, with a 9-16% overhead in computation. For the same computation cost, MONeT requires 1.2-1.8x less memory than current state-of-the-art automated checkpointing frameworks. Our code is availables.

Tags: Computer science, CUDA, Deep learning, Memory, nVidia, Optimization, Package, PyTorch, Tesla P100

November 1, 2020 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Memory Optimization for Deep Networks

Package:

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Memory Optimization for Deep Networks

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)