GCN Inference Acceleration using High-Level Synthesis
University of Southern California, Los Angeles, California
IEEE High Performance Extreme Computing Conference (HPEC), 2021
@article{lin2021gcn,
title={GCN Inference Acceleration using High-Level Synthesis},
author={Lin, Yi Chien and Zhang, Bingyi and Prasanna, Viktor},
year={2021}
}
GCN (Graph Convolutional Network) has become a promising solution for many applications, such as recommendation systems, social data mining, etc. Many of these applications requires low latency GCN inference. In this paper, we provide a case study of a GCN inference acceleration on FPGA. We explore high-level synthesis programming model to achieve low-latency inference. First, we propose a partition-centric mapping strategy to map the execution tasks of GCN onto FPGA to exploit data reuse, which reduces external memory access overhead. Second, we provide HLS-based kernel design with improved memory performance and achieve massive data parallelism. Third, we perform design space exploration to facilitate feasible pre-placement which avoids potential Placeand-Route (PnR) failures. We evaluate our design on a stateof-the-art FPGA platform using three commonly used datasets: Reddit, Yelp and Amazon-2M. We compare our design with two state-of-the-art libraries PyTorch-Geometric (PyG) and Deep Graph Library (DGL) running on high-end CPU and GPU by evaluating their latency and energy efficiency to perform full-batch GCN inference on a two-layer Vanilla-GCN model. Compared with PyG CPU version, our design reduces the latency by 59.95x and is 96.22x more energy efficient on the average. Compared with DGL, our design achieves 2.9x−6.4x speedup and is 5.87x more energy efficient compared with the CPU version. Compared with the DGL GPU version, although the latency of our design is 1.67x−2.5x that of DGL GPU, our design is 1.8x more energy efficient.
October 10, 2021 by hgpu