19198

Deep Learning Based FPGA-CPU Acceleration

R. Luis Jalabert
Norwegian University of Science and Technology, Faculty of Information Technology and Electrical Engineering, Department of Electronic Systems
Norwegian University of Science and Technology, 2019

@mastersthesis{jalabert2019deep,

   title={Deep Learning Based FPGA-CPU Acceleration},

   author={Jalabert, Rodolfo},

   year={2019},

   school={NTNU}

}

The purpose of this project is to continue exploring new ways of accelerating sequential computer code, and finding out if the machine learning techniques available today are able to help us in this task. The core idea is trying to parallelize during run-time (in a way completely transparent to the programmer) the code that’s being executed in the CPU, by snooping on the RAM-CPU communication bus, and, in case an artificial neural network considers this code to be parallelizable, it should be then converted into an FPGA module and executed on it instead. It build upon a previous project in which we used a very simple RISC CPU, called LT16x32, and created several Software applications to aid us in this enterprise. First, a parametric assembly code generator named LoopGen, allowed us to create many examples of code with different degrees of parallelism. Next, a program called LoopSim generated the neural network’s input datasets by compiling the generated code and simulating its execution on the CPU. In this project we improved on the previous neural network architecture by switching from an LSTM to a convolutional model, achieving a much higher classification accuracy, while also having implemented two synthetizable hardware models of this network: one based in High Level C Synthesis (using C++ in Vivado HLS), and another one by traditional RTL means (using VHDL in Vivado). In addition, we have managed to integrate these hardware blocks with the rest of the system; namely, the CPU, memory and the instruction pre-processor, which successfully replicate the results obtained by the software model. While more work needs to be done in order to obtain a complete and more realistic system, this project paves the way for future endeavours in the quest for automatic run-time loop parallelization.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: