high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Memory Efficient Mixed-Precision Optimizers

Memory Efficient Mixed-Precision Optimizers

Basile Lewandowski, Atli Kosson

Machine Learning Optimization laboratory, Ecole Polytechnique Federale de Lausanne

arXiv:2309.12381 [cs.LG], (21 Sep 2023)

DOI:10.48550/arXiv.2309.12381

@misc{lewandowski2023memory,

title={Memory Efficient Mixed-Precision Optimizers},

author={Basile Lewandowski and Atli Kosson},

year={2023},

eprint={2309.12381},

archivePrefix={arXiv},

primaryClass={cs.LG}

}

Download (PDF)

View

Source

1026

views

Traditional optimization methods rely on the use of single-precision floating point arithmetic, which can be costly in terms of memory size and computing power. However, mixed precision optimization techniques leverage the use of both single and half-precision floating point arithmetic to reduce memory requirements while maintaining model accuracy. We provide here an algorithm to further reduce memory usage during the training of a model by getting rid of the floating point copy of the parameters, virtually keeping only half-precision numbers. We also explore the benefits of getting rid of the gradient’s value by executing the optimizer step during the back-propagation. In practice, we achieve up to 25% lower peak memory use and 15% faster training while maintaining the same level of accuracy.

Tags: Computer science, CUDA, Machine learning, Mixed precision, nVidia, nVidia A100, nVidia V100

October 1, 2023 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Memory Efficient Mixed-Precision Optimizers

Your response

Recent source codes

Awesome LLM-Driven Kernel Generation

PhysProver: Advancing Automatic Theorem Proving for Physics

ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation

SeedFold: Scaling Biomolecular Structure Prediction

Tilus: A Tile-Level GPU Kernel Programming Language

Memory-Efficient Acceleration of Block Low-Rank Foundation Models on Resource Constrained GPUs

BoltzGen:Toward Universal Binder Design

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution

MATLAB Tensor Core models

Most viewed papers (last 30 days)

Memory Efficient Mixed-Precision Optimizers

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)