high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » De-specializing an HLS library for Deep Neural Networks: improvements upon hls4ml

De-specializing an HLS library for Deep Neural Networks: improvements upon hls4ml

Serena Curzel, Nicolò Ghielmetti, Michele Fiorito, Fabrizio Ferrandi

Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Italy

arXiv:2103.13060 [cs.AR], (24 Mar 2021)

BibTeX

Download (PDF)

View

Source

1613

views

Custom hardware accelerators for Deep Neural Networks are increasingly popular: in fact, the flexibility and performance offered by FPGAs are well-suited to the computational effort and low latency constraints required by many image recognition and natural language processing tasks. The gap between high-level Machine Learning frameworks (e.g., Tensorflow, Pytorch) and low-level hardware design in Verilog/VHDL creates a barrier to widespread adoption of FPGAs, which can be overcome with the help of High-Level Synthesis. hls4ml is a framework that translates Deep Neural Networks into annotated C++ code for High-Level Synthesis, offering a complete and user-friendly design process that has been enthusiastically adopted in physics research. We analyze the strengths and weaknesses of hls4ml, drafting a plan to enhance its core library of components in order to allow more advanced optimizations, target a wider selection of FPGAs, and support larger Neural Network models.

Tags: Computer science, FPGA, Hardware Architecture, Image recognition, Machine learning, Neural networks, Physics

March 28, 2021 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

De-specializing an HLS library for Deep Neural Networks: improvements upon hls4ml

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

De-specializing an HLS library for Deep Neural Networks: improvements upon hls4ml

Share this:

Recent source codes

Most viewed papers (last 30 days)