high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Mixing Low-Precision Formats in Multiply-Accumulate Units for DNN Training

Mixing Low-Precision Formats in Multiply-Accumulate Units for DNN Training

Mariko Tatsumi, Silviu-Ioan Filip, Caroline White, Olivier Sentieys, Guy Lemieux

Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, Canada

hal-03885471, (December 5, 2022)

BibTeX

Download (PDF)

View

Source

997

views

The most compute-intensive stage of deep neural network (DNN) training is matrix multiplication where the multiply-accumulate (MAC) operator is key. To reduce training costs, we consider using low-precision arithmetic for MAC operations. While low-precision training has been investigated in prior work, the focus has been on reducing the number of bits in weights or activations without compromising accuracy. In contrast, the focus in this paper is on implementation details beyond weight or activation width that affect area and accuracy. In particular, we investigate the impact of fixed-versus floating-point representations, multiplier rounding, and floatingpoint exceptional value support. Results suggest that (1) lowprecision floating-point is more area-effective than fixed-point for multiplication, (2) standard IEEE-754 rules for subnormals, NaNs, and intermediate rounding serve little to no value in terms of accuracy but contribute significantly to area, (3) lowprecision MACs require an adaptive loss-scaling step during training to compensate for limited representation range, and (4) fixed-point is more area-effective for accumulation, but the cost of format conversion and downstream logic can swamp the savings. Finally, we note that future work should investigate accumulation structures beyond the MAC level to achieve further gains.

Tags: Computer science, CUDA, Deep learning, FPGA, nVidia, Precision

December 11, 2022 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Mixing Low-Precision Formats in Multiply-Accumulate Units for DNN Training

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

Mixing Low-Precision Formats in Multiply-Accumulate Units for DNN Training

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)