high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Unified Deep Learning with CPU, GPU, and FPGA Technologies

Unified Deep Learning with CPU, GPU, and FPGA Technologies

Allen Rush, Ashish Sirasao, Mike Ignatowski

Advanced Micro Devices

AMD Whitepaper, 2017

BibTeX

Download (PDF)

View

Source

5561

views

Deep learning and complex machine learning has quickly become one of the most important computationally intensive applications for a wide variety of fields. The combination of large data sets, high-performance computational capabilities, and evolving and improving algorithms has enabled many successful applications which were previously difficult or impossible to consider. This paper explores the challenges of deep learning training and inference, and discusses the benefits of a comprehensive approach for combining CPU, GPU, FPGA technologies, along with the appropriate software frameworks in a unified deep learning architecture. Each of these hardware technologies offers unique benefits to the deep learning problem, and a properly designed system can take advantage of this combination. Moreover, the combination can provide unique capabilities that result in higher performance, better efficiency, greater flexibility, and a hedge against algorithm obsolescence compared to CPU/GPU and FPGA systems designed separately. Aside from the underlying hardware approaches, a unified software environment is necessary to provide a clean interface to the application layer. This needs to account for several factors, including framework support, different compiler and code generator technologies, and optimization support for the underlying hardware engines. Higher-level frameworks (e.g., TensorFlow, Theano) can effectively hide most heterogeneity from application developers as well as enable portability across different systems. This is a powerful enabler for heterogeneous hardware. For application developers working below the framework level, the AMD ROCm and MIopen software frameworks are discussed as an example of a unified software environment applicable to a CPU and GPU solution. FPGAs are primarily used for inference, and the xfDNN middleware from Xilinx captures the software features essential for implementing deep learning inference on FPGAs. A long-term vision for application developers is a full and seamless programing environment that works across CPUs, GPUs, and FPGAs. This could initially focus on support for a common language and runtime, such as OpenCL, and later be extended to additional languages. The language support would hide any internal differences in compilers and runtimes between the CPU, GPU, and FPGA implementations. This seamless programming environment will facilitate the full end-to-end optimization of resource allocation.

Tags: Computer science, Deep learning, FPGA, Heterogeneous systems, Machine learning, OpenCL

November 21, 2017 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

Unified Deep Learning with CPU, GPU, and FPGA Technologies

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Unified Deep Learning with CPU, GPU, and FPGA Technologies

Share this:

Recent source codes

Most viewed papers (last 30 days)