How to Train BERT with an Academic Budget
title={How to Train BERT with an Academic Budget},
author={Peter Izsak and Moshe Berchansky and Omer Levy},
year={2021},
eprint={2104.07705},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
GPUs are now used for a wide range of problems within HPC. However, making efficient use of the computational power available with multiple GPUs is challenging. The main challenges in achieving good performance are memory layout, affecting memory bandwidth, effective use of the memory spaces with a GPU, inter-GPU communication, and synchronization. We address these problems with the Ripple library, which provides a unified view of the computational space across multiple dimensions and multiple GPUs, allows polymorphic data layout, and provides a simple graph interface to describe an algorithm from which inter-GPU data transfers can be optimally scheduled. We describe the abstractions provided by Ripple to allow complex computations to be described simply, and to execute efficiently across many GPUs with minimal overhead. We show performance results for a number of examples, from particle motion to finite-volume methods and the eikonal equation, as well as showing good strong and weak scaling results across multiple GPUs.