high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Mixed precision in Graphics Processing Unit

Mixed precision in Graphics Processing Unit

Quentin Gallouédec

École Centrale de Lyon

arXiv:2110.12794 [cs.AR], (25 Oct 2021)

BibTeX

Download (PDF)

View

Source

1375

views

Modern graphics computing units (GPUs) are designed and optimized to perform highly parallel numerical calculations. This parallelism has enabled (and promises) significant advantages, both in terms of energy performance and calculation. In this document, we take stock of the different applications of mixed precision. We recall the standards currently used in the overwhelming majority of systems in terms of numerical computation. We show that the mixed precision which decreases the precision at the input of an operation does not necessarily decrease the precision of its output. We show that this previous principle allows its transposition into one of the branches that most needs computing power: machine learning. The use of fixed point numbers and half-precision are two very effective ways to increase the learning ability of complex neural networks. Mixed precision still requires the use of suitable hardware, failing which the calculation time could on the contrary be lengthened. The NVIDIA Tensor Core that is found among others in their Tesla V100 range, is an example of implementation at the hardware level of mixed precision. On the other hand, by abandoning the traditional von Neumann model, mixed precision can also be transposed to a lower level of abstraction, using phase change memories.

Tags: Computer science, Machine learning, Mixed precision, Neural networks, nVidia, Tesla V100

October 31, 2021 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

Mixed precision in Graphics Processing Unit

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Mixed precision in Graphics Processing Unit

Share this:

Recent source codes

Most viewed papers (last 30 days)