high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » ACRoBat: Optimizing Auto-batching of Dynamic Deep Learning at Compile Time

ACRoBat: Optimizing Auto-batching of Dynamic Deep Learning at Compile Time

Pratik Fegade, Tianqi Chen, Phillip B. Gibbons, Todd C. Mowry

Computer Science Department, Carnegie Mellon University, USA

arXiv:2305.10611 [cs.LG], (17 May 2023)

DOI:10.48550/arXiv.2305.10611

BibTeX

Download (PDF)

View

Source

816

views

Dynamic control flow is an important technique often used to design expressive and efficient deep learning computations for applications such as text parsing, machine translation, exiting early out of deep models and so on. However, the resulting control flow divergence makes batching, an important performance optimization, difficult to perform manually. In this paper, we present ACRoBat, a framework that enables efficient automatic batching for dynamic deep learning computations by performing hybrid static+dynamic compiler optimizations and end-to-end tensor code generation. ACRoBat performs up to 8.5X better than DyNet, a state-of-the-art framework for automatic batching, on an Nvidia GeForce RTX 3070 GPU.

Tags: Code generation, Computer science, CUDA, Deep learning, nVidia, nVidia GeForce RTX 3070

May 28, 2023 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

ACRoBat: Optimizing Auto-batching of Dynamic Deep Learning at Compile Time

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

ACRoBat: Optimizing Auto-batching of Dynamic Deep Learning at Compile Time

Share this:

Recent source codes

Most viewed papers (last 30 days)