high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » FPGA Based Implementation of Deep Neural Networks Using On-chip Memory Only

FPGA Based Implementation of Deep Neural Networks Using On-chip Memory Only

Jinhwan Park, Wonyong Sung

Department of Electrical and Computer Engineering, Seoul National University, Seoul 151-744 Korea

arXiv:1602.01616 [cs.AR], (4 Feb 2016)

BibTeX

Download (PDF)

View

Source

2262

views

Deep neural networks (DNNs) demand a very large amount of computation and weight storage, and thus efficient implementation using special purpose hardware is highly desired. In this work, we have developed an FPGA based fixed-point DNN system using only on-chip memory not to access external DRAM. The execution time and energy consumption of the developed system is compared with a GPU based implementation. Since the capacity of memory in FPGA is limited, only 3-bit weights are used for this implementation, and training based fixed-point weight optimization is employed. The implementation using Xilinx XC7Z045 is tested for the MNIST handwritten digit recognition benchmark and a phoneme recognition task on TIMIT corpus. The obtained speed is about one quarter of a GPU based implementation and much better than that of a PC based one. The power consumption is less than 5 Watt at the full speed operation resulting in much higher efficiency compared to GPU based systems.

Tags: Computer science, Deep learning, FPGA, Machine learning, Neural networks, nVidia, nVidia GeForce GTX Titan Black

February 6, 2016 by hgpu

Rating: 2.0/5. From 1 vote.

Please wait...

high performance computing on graphics processing units: hgpu.org

FPGA Based Implementation of Deep Neural Networks Using On-chip Memory Only

Recent source codes

Coccinelle: a C code transformation engine using SmPL for matches, refactorings, and bug fixing

DuoReduce: MLIR's benchmark

Shamrock: Multi-GPU hydrodynamics for astrophysics

LLMPerf: GPU Performance Modeling meets Large Language Models

Hercules: A Compiler for Productive Programming of Heterogeneous Systems

Celerity Runtime: High-level C++ for Accelerator Clusters

wgpy: WebGL accelerated numpy-compatible array library for web browser

Microbenchmarking OpenMP target offload with Catch2

SUperman: Highly Efficient Permanent Computation Library

TransCL: An Automatic CUDA-to-OpenCL Programs Transformation Framework

Most viewed papers (last 30 days)

FPGA Based Implementation of Deep Neural Networks Using On-chip Memory Only

Share this:

Recent source codes

Most viewed papers (last 30 days)