high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » How to Train BERT with an Academic Budget

How to Train BERT with an Academic Budget

Peter Izsak, Moshe Berchansky, Omer Levy

Intel Labs

arXiv:2104.07705 [cs.CL], (15 Apr 2021)

BibTeX

Download (PDF)

View

Source

1733

views

GPUs are now used for a wide range of problems within HPC. However, making efficient use of the computational power available with multiple GPUs is challenging. The main challenges in achieving good performance are memory layout, affecting memory bandwidth, effective use of the memory spaces with a GPU, inter-GPU communication, and synchronization. We address these problems with the Ripple library, which provides a unified view of the computational space across multiple dimensions and multiple GPUs, allows polymorphic data layout, and provides a simple graph interface to describe an algorithm from which inter-GPU data transfers can be optimally scheduled. We describe the abstractions provided by Ripple to allow complex computations to be described simply, and to execute efficiently across many GPUs with minimal overhead. We show performance results for a number of examples, from particle motion to finite-volume methods and the eikonal equation, as well as showing good strong and weak scaling results across multiple GPUs.

Tags: Artificial intelligence, Computer science, Deep learning, Machine learning, NLP, nVidia, nVidia GeForce GTX Titan V, Tesla V100

April 25, 2021 by hgpu

Rating: 3.0/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

How to Train BERT with an Academic Budget

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

How to Train BERT with an Academic Budget

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)