LLVM-based automation of memory decoupling for OpenCL applications on FPGAs

hgpu.org » Applications » Computer science » LLVM-based automation of memory decoupling for OpenCL applications on FPGAs

LLVM-based automation of memory decoupling for OpenCL applications on FPGAs

Arnab A Purkayastha, Samuel Rogers, Suhas A Shiddhibhabi, Hamed Tabkhi

Department of Electrical and Computer Engineering, University of North Carolina Charlotte, NC, USA, 28223

Microprocessors and Microsystems, 2020

DOI:10.1016/j.micpro.2019.102909

BibTeX

Download (PDF)

View

Source

Source codes

Package:

OpenCL-FPGA-LLVM: LLVM-based automation of memory decoupling for OpenCL applications on FPGAs

2260

views

The availability of OpenCL High-Level Synthesis (OpenCL-HLS) has made FPGAs an attractive platform for power-efficient high-performance execution of massively parallel applications. At the same time, new design challenges emerge for massive thread-level parallelism on FPGAs. One major execution bottleneck is the high number of memory stalls exposed to data-path which overshadows the benefits of data-path customization. This article presents a novel LLVM-based tool for decoupling memory access from computation when synthesizing massively parallel OpenCL kernels on FPGAs. To enable systematic decoupling, we use the idea of kernel parallelism and implement a new parallelism granularity that breaks down kernels to separate data-path and memory-path (memory read/write) which work concurrently to overlap the computation of current threads with the memory access of future threads (memory pre-fetching at large scale). At the same time, this paper proposes an LLVM-based static analysis to detect the decouplable data for resolving the data dependency and maximize concurrency across the kernels. The experimental results on eight Rodinia benchmarks on Intel Stratix V FPGA demonstrate significant performance and energy improvement over the baseline implementation using Intel OpenCL SDK. The proposed sub-kernel parallelism achieves more than 2x speedup, with only 3% increase in resource utilization, and 7% increase in power consumption which reduces the overall energy consumption more than 40%.

Tags: Computer science, FPGA, LLVM, OpenCL, Package

January 5, 2020 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org