26041

Optimization of Compiler-generated OpenCL CNN Kernels and Runtime for FPGAs

Seung-Hun Chung
University of Toronto
University of Toronto, 2021

@phdthesis{chung2021optimization,

   title={Optimization of Compiler-generated OpenCL CNN Kernels and Runtime for FPGAs},

   author={Chung, Seung-Hun},

   year={2021}

}

Download Download (PDF)   View View   Source Source   

872

views

This work explores the viability of end-to-end convolutional neural network inference using OpenCL HLS kernels generated from TVM on Intel FPGAs. We explore layer-pipelined execution for small networks and time-multiplexed kernels for larger CNNs. Naively generated kernels do not produce efficient hardware. We propose a set of optimizations to increase parallelism, resource utilization, and more efficiently use memory bandwidth. They include loop unrolling, tiling, fusion, invariant code motion, cached writes, CL channels, autorun kernels, concurrent execution, and parameterized kernels. These optimizations improve performance up to a factor of 1150x over the naive baseline implementation generated by TVM. Compared to Keras/Tensorflow on a 56-core Xeon 8280, we observe performance improvements up to 4.57x and 1.4x over LeNet and MobileNet but has a slowdown at 0.43x for ResNet18/34.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: