25672

GCN Inference Acceleration using High-Level Synthesis

Yi Chien Lin, Bingyi Zhang, Viktor Prasanna
University of Southern California, Los Angeles, California
IEEE High Performance Extreme Computing Conference (HPEC), 2021
BibTeX

Download Download (PDF)   View View   Source Source   Source codes Source codes

1046

views

GCN (Graph Convolutional Network) has become a promising solution for many applications, such as recommendation systems, social data mining, etc. Many of these applications requires low latency GCN inference. In this paper, we provide a case study of a GCN inference acceleration on FPGA. We explore high-level synthesis programming model to achieve low-latency inference. First, we propose a partition-centric mapping strategy to map the execution tasks of GCN onto FPGA to exploit data reuse, which reduces external memory access overhead. Second, we provide HLS-based kernel design with improved memory performance and achieve massive data parallelism. Third, we perform design space exploration to facilitate feasible pre-placement which avoids potential Placeand-Route (PnR) failures. We evaluate our design on a stateof-the-art FPGA platform using three commonly used datasets: Reddit, Yelp and Amazon-2M. We compare our design with two state-of-the-art libraries PyTorch-Geometric (PyG) and Deep Graph Library (DGL) running on high-end CPU and GPU by evaluating their latency and energy efficiency to perform full-batch GCN inference on a two-layer Vanilla-GCN model. Compared with PyG CPU version, our design reduces the latency by 59.95x and is 96.22x more energy efficient on the average. Compared with DGL, our design achieves 2.9x−6.4x speedup and is 5.87x more energy efficient compared with the CPU version. Compared with the DGL GPU version, although the latency of our design is 1.67x−2.5x that of DGL GPU, our design is 1.8x more energy efficient.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org