Compiler-centric across-stack deep learning acceleration

hgpu.org » Applications » Computer science » Compiler-centric across-stack deep learning acceleration

Compiler-centric across-stack deep learning acceleration

Perry Gibson

School of Computing Science, College of Science and Engineering, University of Glasgow

University of Glasgow, 2023

DOI:10.5525/gla.thesis.83959

@phdthesis{gibson2023compiler,

title={Compiler-centric across-stack deep learning acceleration},

author={Gibson, Perry},

year={2023},

school={University of Glasgow}

}

Download (PDF)

View

Source

Source codes

Package:

bib boi: Tools to help with LaTeX paper writing

1538

views

Optimizing the deployment of Deep Neural Networks (DNNs) is hard. Despite deep learning approaches increasingly providing state-of-the-art solutions to a variety of difficult problems, such as computer vision and natural language processing, DNNs can be prohibitively expensive, for example, in terms of inference time or memory usage. Effective exploration of the design space requires a holistic approach, including a range of topics from machine learning, systems, and hardware. The rapid proliferation of deep learning applications has raised demand for efficient exploration and acceleration of deep learning based solutions. However, managing the range of optimization techniques, as well as how they interact with each other across the stack is a non-trivial task. A family of emerging specialized compilers for deep learning, tensor compilers, appear to be a strong candidate to help manage the complexity of across-stack optimization choices, and enable new approaches. This thesis presents new techniques and explorations of the Deep Learning Acceleration Stack (DLAS), with the perspective that the tensor compiler will increasingly be the center of this stack. First, we motivate the challenges in exploring DLAS, by describing the experience of running a perturbation study varying parameters at every layer of the stack. The core of the study is implemented using a tensor compiler, which reduces the complexity of evaluating the wide range of variants, although still requires a significant engineering effort to realize. Next, we develop a new algorithm for grouped convolution, a model optimization technique for which existing solutions provided poor inference time scaling. We implement and optimize our algorithm using a tensor compiler, outperforming existing approaches by 5.1x on average (arithmetic mean). Finally, we propose a technique, transfer-tuning, to reduce the search time required for automatic tensor compiler code optimization, reducing the search time required by 6.5x on average. The techniques and contributions of this thesis across these interconnected domains demonstrate the exciting potential of tensor compilers to simplify and improve design space exploration for DNNs, and their deployment. The outcomes of this thesis enable new lines of research to enable machine learning developers to keep up with the rapidly evolving landscape of neural architectures and hardware.

Tags: Compilers, Computer science, Deep learning, Design space exploration, nVidia, nVidia Jetson AGX Xavier, nVidia Titan RTX, Thesis

December 10, 2023 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

* * *

high performance computing on graphics processing units: hgpu.org