30496

Hardware Acceleration for Neural Networks: A Comprehensive Survey

Bin Xu, Ayan Banerjee, Sandeep Gupta
School of Electrical, Computer and Energy Engineering, Arizona State University, USA
arXiv:2512.23914 [eess.SY], (30 Dec 2025)

@misc{xu2025hardwareaccelerationneuralnetworks,

   title={Hardware Acceleration for Neural Networks: A Comprehensive Survey},

   author={Bin Xu and Ayan Banerjee and Sandeep Gupta},

   year={2025},

   eprint={2512.23914},

   archivePrefix={arXiv},

   primaryClass={eess.SY},

   url={https://arxiv.org/abs/2512.23914}

}

Download Download (PDF)   View View   Source Source   

498

views

Neural networks have become a dominant computational workload across cloud and edge platforms, but rapid growth in model size and deployment diversity has exposed hardware bottlenecks increasingly dominated by memory movement, communication, and irregular operators rather than peak arithmetic throughput. This survey reviews the technology landscape for hardware acceleration of deep learning, spanning GPUs and tensor-core architectures; domain-specific accelerators (e.g., TPUs/NPUs); FPGA-based designs; ASIC inference engines; and emerging LLM-serving accelerators such as LPUs (language processing units), alongside in-/near-memory computing and neuromorphic/analog approaches. We organize the space using a unified taxonomy across (i) workloads (CNNs, RNNs, GNNs, and Transformers/LLMs), (ii) execution settings (training vs. inference; datacenter vs. edge), and (iii) optimization levers (reduced precision, sparsity and pruning, operator fusion, compilation and scheduling, and memory-system/interconnect design). We synthesize key architectural ideas including systolic arrays, vector and SIMD engines, specialized attention and softmax kernels, quantization-aware datapaths, and high-bandwidth memory, and we discuss how software stacks and compilers bridge model semantics to hardware. Finally, we highlight open challenges — including efficient long-context LLM inference (KV-cache management), robust support for dynamic and sparse workloads, energy- and security-aware deployment, and fair benchmarking — and point to promising directions for the next generation of neural acceleration.
No votes yet.
Please wait...

You must be logged in to post a comment.

* * *

* * *

HGPU group © 2010-2026 hgpu.org

All rights belong to the respective authors

Contact us: