high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » GCN Inference Acceleration using High-Level Synthesis

GCN Inference Acceleration using High-Level Synthesis

Yi Chien Lin, Bingyi Zhang, Viktor Prasanna

University of Southern California, Los Angeles, California

IEEE High Performance Extreme Computing Conference (HPEC), 2021

BibTeX

Download (PDF)

View

Source

Source codes

Package:

GCN Inference Acceleration HLS

1209

views

GCN (Graph Convolutional Network) has become a promising solution for many applications, such as recommendation systems, social data mining, etc. Many of these applications requires low latency GCN inference. In this paper, we provide a case study of a GCN inference acceleration on FPGA. We explore high-level synthesis programming model to achieve low-latency inference. First, we propose a partition-centric mapping strategy to map the execution tasks of GCN onto FPGA to exploit data reuse, which reduces external memory access overhead. Second, we provide HLS-based kernel design with improved memory performance and achieve massive data parallelism. Third, we perform design space exploration to facilitate feasible pre-placement which avoids potential Placeand-Route (PnR) failures. We evaluate our design on a stateof-the-art FPGA platform using three commonly used datasets: Reddit, Yelp and Amazon-2M. We compare our design with two state-of-the-art libraries PyTorch-Geometric (PyG) and Deep Graph Library (DGL) running on high-end CPU and GPU by evaluating their latency and energy efficiency to perform full-batch GCN inference on a two-layer Vanilla-GCN model. Compared with PyG CPU version, our design reduces the latency by 59.95x and is 96.22x more energy efficient on the average. Compared with DGL, our design achieves 2.9x−6.4x speedup and is 5.87x more energy efficient compared with the CPU version. Compared with the DGL GPU version, although the latency of our design is 1.67x−2.5x that of DGL GPU, our design is 1.8x more energy efficient.

Tags: Computer science, Data parallelism, Design space exploration, FPGA, nVidia, nVidia GeForce GTX Titan XP, OpenCL, Package

October 10, 2021 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org

GCN Inference Acceleration using High-Level Synthesis

Package:

Your response

Recent source codes

Luthier: Bridging Auto-Tuning and Vendor Libraries for Efficient Deep Learning Inference

Fused Kernel Library (FKL)

GPUHammer: Rowhammer Attacks on GPU Memories are Practical

Block: Balance Loader of LLM Serving with Context, Knowledge and Predictive Scheduling

SIGMo: Scalable Isomorphism Graph Matching on GPUs

DGEMM without FP64 Arithmetic - using FP64 Emulation and FP8 Tensor Cores with Ozaki Scheme

GEAK-agent: LLM-based AI agent, which can write correct and efficient GPU kernels automatically

OpenDwarfs 2025: re-engineered version of the OpenDwarfs benchmark suite, for compatibility with modern platforms

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Most viewed papers (last 30 days)

GCN Inference Acceleration using High-Level Synthesis

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)