high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Computational Performance Predictions for Deep Neural Network Training: A Runtime-Based Approach

Computational Performance Predictions for Deep Neural Network Training: A Runtime-Based Approach

Geoffrey X. Yu, Yubo Gao, Pavel Golikov, Gennady Pekhimenko

University of Toronto

arXiv:2102.00527 [cs.LG], (31 Jan 2021)

@misc{yu2021computational,

title={Computational Performance Predictions for Deep Neural Network Training: A Runtime-Based Approach},

author={Geoffrey X. Yu and Yubo Gao and Pavel Golikov and Gennady Pekhimenko},

year={2021},

eprint={2102.00527},

archivePrefix={arXiv},

primaryClass={cs.LG}

}

Download (PDF)

View

Source

1647

views

Deep learning researchers and practitioners usually leverage GPUs to help train their deep neural networks (DNNs) faster. However, choosing which GPU to use is challenging both because (i) there are many options, and (ii) users grapple with competing concerns: maximizing compute performance while minimizing costs. In this work, we present a new practical technique to help users make informed and cost-efficient GPU selections: make performance predictions using the help of a GPU that the user already has. Our technique exploits the observation that, because DNN training consists of repetitive compute steps, predicting the execution time of a single iteration is usually enough to characterize the performance of an entire training process. We make predictions by scaling the execution time of each operation in a training iteration from one GPU to another using either (i) wave scaling, a technique based on a GPU’s execution model, or (ii) pre-trained multilayer perceptrons. We implement our technique into a Python library called Surfer and find that it makes accurate iteration execution time predictions on ResNet-50, Inception v3, the Transformer, GNMT, and DCGAN across six different GPU architectures. Surfer currently supports PyTorch, is easy to use, and requires only a few lines of code.

Tags: Computer science, CUDA, Deep learning, Machine learning, Neural networks, nVidia, nVidia GeForce RTX 2070, Performance, Python, Tesla P100, Tesla V100

February 7, 2021 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

high performance computing on graphics processing units: hgpu.org

Computational Performance Predictions for Deep Neural Network Training: A Runtime-Based Approach

Your response

Recent source codes

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Vortex-Optimized Light-weight Toolchain (VOLT)

SciDef: Automated Definition Extraction from Scientific Literature

bioagent-bench: Benchmark for evaluating LLM agents in bioinformatics

Benchmark suite for LLM inference on NVIDIA consumer GPUs

Theorizer: from the paper Generating Literature-Driven Scientific Discoveries at Scale

Nsight Python: a Python kernel profiling interface based on NVIDIA Nsight Tools

Awesome LLM-Driven Kernel Generation

PhysProver: Advancing Automatic Theorem Proving for Physics

Most viewed papers (last 30 days)

Computational Performance Predictions for Deep Neural Network Training: A Runtime-Based Approach

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)