19301

LLVM-based automation of memory decoupling for OpenCL applications on FPGAs

Arnab A Purkayastha, Samuel Rogers, Suhas A Shiddhibhabi, Hamed Tabkhi
Department of Electrical and Computer Engineering, University of North Carolina Charlotte, NC, USA, 28223
Microprocessors and Microsystems, 2020

@article{purkayastha2020llvm,

   title={LLVM-based automation of memory decoupling for OpenCL applications on FPGAs},

   author={Purkayastha, Arnab A and Rogers, Samuel and Shiddibhavi, Suhas A and Tabkhi, Hamed},

   journal={Microprocessors and Microsystems},

   volume={72},

   pages={102909},

   year={2020},

   publisher={Elsevier}

}

The availability of OpenCL High-Level Synthesis (OpenCL-HLS) has made FPGAs an attractive platform for power-efficient high-performance execution of massively parallel applications. At the same time, new design challenges emerge for massive thread-level parallelism on FPGAs. One major execution bottleneck is the high number of memory stalls exposed to data-path which overshadows the benefits of data-path customization. This article presents a novel LLVM-based tool for decoupling memory access from computation when synthesizing massively parallel OpenCL kernels on FPGAs. To enable systematic decoupling, we use the idea of kernel parallelism and implement a new parallelism granularity that breaks down kernels to separate data-path and memory-path (memory read/write) which work concurrently to overlap the computation of current threads with the memory access of future threads (memory pre-fetching at large scale). At the same time, this paper proposes an LLVM-based static analysis to detect the decouplable data for resolving the data dependency and maximize concurrency across the kernels. The experimental results on eight Rodinia benchmarks on Intel Stratix V FPGA demonstrate significant performance and energy improvement over the baseline implementation using Intel OpenCL SDK. The proposed sub-kernel parallelism achieves more than 2x speedup, with only 3% increase in resource utilization, and 7% increase in power consumption which reduces the overall energy consumption more than 40%.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: