Unified Deep Learning with CPU, GPU, and FPGA Technologies

Allen Rush, Ashish Sirasao, Mike Ignatowski
Advanced Micro Devices
AMD Whitepaper, 2017


   title={Unified Deep Learning with CPU, GPU, and FPGA Technologies},

   author={Rush, Allen and Sirasao, Ashish and Ignatowski, Mike},



Download Download (PDF)   View View   Source Source   



Deep learning and complex machine learning has quickly become one of the most important computationally intensive applications for a wide variety of fields. The combination of large data sets, high-performance computational capabilities, and evolving and improving algorithms has enabled many successful applications which were previously difficult or impossible to consider. This paper explores the challenges of deep learning training and inference, and discusses the benefits of a comprehensive approach for combining CPU, GPU, FPGA technologies, along with the appropriate software frameworks in a unified deep learning architecture. Each of these hardware technologies offers unique benefits to the deep learning problem, and a properly designed system can take advantage of this combination. Moreover, the combination can provide unique capabilities that result in higher performance, better efficiency, greater flexibility, and a hedge against algorithm obsolescence compared to CPU/GPU and FPGA systems designed separately. Aside from the underlying hardware approaches, a unified software environment is necessary to provide a clean interface to the application layer. This needs to account for several factors, including framework support, different compiler and code generator technologies, and optimization support for the underlying hardware engines. Higher-level frameworks (e.g., TensorFlow, Theano) can effectively hide most heterogeneity from application developers as well as enable portability across different systems. This is a powerful enabler for heterogeneous hardware. For application developers working below the framework level, the AMD ROCm and MIopen software frameworks are discussed as an example of a unified software environment applicable to a CPU and GPU solution. FPGAs are primarily used for inference, and the xfDNN middleware from Xilinx captures the software features essential for implementing deep learning inference on FPGAs. A long-term vision for application developers is a full and seamless programing environment that works across CPUs, GPUs, and FPGAs. This could initially focus on support for a common language and runtime, such as OpenCL, and later be extended to additional languages. The language support would hide any internal differences in compilers and runtimes between the CPU, GPU, and FPGA implementations. This seamless programming environment will facilitate the full end-to-end optimization of resource allocation.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: