high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism

Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism

Xupeng Miao, Yujie Wang, Youhe Jiang, Chunan Shi, Xiaonan Nie, Hailin Zhang, Bin Cui

School of Computer Science & Key Lab of High Confidence Software Technologies (MOE), Peking University

arXiv:2211.13878 [cs.LG], (25 Nov 2022)

DOI:10.48550/arXiv.2211.13878

BibTeX

Download (PDF)

View

Source

Source codes

Package:

HETU: a high-performance distributed deep learning system targeting large-scale and automated distributed training

1188

views

Transformer models have achieved state-of-the-art performance on various domains of applications and gradually becomes the foundations of the advanced large deep learning (DL) models. However, how to train these models over multiple GPUs efficiently is still challenging due to a large number of parallelism choices. Existing DL systems either rely on manual efforts to make distributed training plans or apply parallelism combinations within a very limited search space. In this approach, we propose Galvatron, a new system framework that incorporates multiple popular parallelism dimensions and automatically finds the most efficient hybrid parallelism strategy. To better explore such a rarely huge search space, we 1) involve a decision tree to make decomposition and pruning based on some reasonable intuitions, and then 2) design a dynamic programming search algorithm to generate the optimal plan. Evaluations on four representative Transformer workloads show that Galvatron could perform automatically distributed training with different GPU memory budgets. Among all evluated scenarios, Galvatron always achieves superior system throughput compared to previous work with limited parallelism.

Tags: Computer science, CUDA, Databases, Deep learning, nVidia, nVidia Titan RTX, Package

December 4, 2022 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism

Package:

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)