LLVM-based automation of memory decoupling for OpenCL applications on FPGAs
Department of Electrical and Computer Engineering, University of North Carolina Charlotte, NC, USA, 28223
Microprocessors and Microsystems, 2020
@article{purkayastha2020llvm,
title={LLVM-based automation of memory decoupling for OpenCL applications on FPGAs},
author={Purkayastha, Arnab A and Rogers, Samuel and Shiddibhavi, Suhas A and Tabkhi, Hamed},
journal={Microprocessors and Microsystems},
volume={72},
pages={102909},
year={2020},
publisher={Elsevier}
}
The availability of OpenCL High-Level Synthesis (OpenCL-HLS) has made FPGAs an attractive platform for power-efficient high-performance execution of massively parallel applications. At the same time, new design challenges emerge for massive thread-level parallelism on FPGAs. One major execution bottleneck is the high number of memory stalls exposed to data-path which overshadows the benefits of data-path customization. This article presents a novel LLVM-based tool for decoupling memory access from computation when synthesizing massively parallel OpenCL kernels on FPGAs. To enable systematic decoupling, we use the idea of kernel parallelism and implement a new parallelism granularity that breaks down kernels to separate data-path and memory-path (memory read/write) which work concurrently to overlap the computation of current threads with the memory access of future threads (memory pre-fetching at large scale). At the same time, this paper proposes an LLVM-based static analysis to detect the decouplable data for resolving the data dependency and maximize concurrency across the kernels. The experimental results on eight Rodinia benchmarks on Intel Stratix V FPGA demonstrate significant performance and energy improvement over the baseline implementation using Intel OpenCL SDK. The proposed sub-kernel parallelism achieves more than 2x speedup, with only 3% increase in resource utilization, and 7% increase in power consumption which reduces the overall energy consumption more than 40%.
January 5, 2020 by hgpu