Analysis and Comparison of Performance and Power Consumption of Neural Networks on CPU, GPU, TPU and FPGA
Institut für Informatik Universität Hildesheim
Universität Hildesheim, 2021
@article{hesse2021analysis,
title={Analysis and Comparison of Performance and Power Consumption of Neural Networks on CPU, GPU, TPU and FPGA},
author={Hesse, Christopher Noel},
year={2021}
}
In this work, we analyze the performance of neural networks on a variety of heterogenous platforms. We strive to find the best platform in terms of raw benchmark performance, performance per watt and performance per Euro. To reach this goal, we focused on convolutional neural networks and created several micro- and macrobenchmark applications and used a state-of-the-art real-world network, YOLOv3. We parametrized the benchmarks to analyze the effect of input size, kernel size and other variables on the performance and efficiency. Our results show that a system using FPGA accelerators is about 7x to 45x faster than a comparable system using high-end GPUs, while consuming about 10% of the power. Novel heterogenous architectures like the Apple M1 integrated SoC offer between 3-5x better performance while drawing 10-20% of the power compared to existing consumer hardware with x86 CPUs and third-party GPUs. We conclude that the FPGA is the most effective accelerator for neural networks in heterogenous systems. It outmatches powerful GPU server class hardware in both performance and efficiency. The Apple M1 SoC offers the best performance per Euro in our tests.
December 5, 2021 by hgpu