Exploring FPGA-specific Optimizations for Irregular OpenCL Applications
Electrical & Computer Engineering, Virginia Tech, Blacksburg, VA, USA
roceedings of the International Conference on Reconfigurable Computing and FPGAs (ReConFig), 2018
@article{hassan2018exploring,
title={Exploring FPGA-specific Optimizations for Irregular OpenCL Applications},
author={Hassan, Mohamed W and Helal, Ahmed E and Athanas, Peter M and Feng, Wu-Chun and Hanafy, Yasser Y},
year={2018}
}
OpenCL is emerging as a high-level hardware description language to address the productivity challenges of developing applications on FPGAs. Unlike traditional hardware description languages (HDLs), OpenCL provides an abstract interface to facilitate high productivity, enabling end users to rapidly describe the required computations, including parallelism and data movement, to create custom hardware accelerators for their applications. However, these OpenCL-realized accelerators are unlikely to make efficient use of the reconfigurable fabric without adopting FPGA-specific optimizations, particularly for irregular OpenCL applications. Consequently, we explore the FPGAspecific optimization space for OpenCL applications and present insights on which optimization techniques improve application performance and resource utilization. Exploring this optimization space will enable end users to harness the computational potential of the FPGA. While these optimizations are general and applicable to any application, the expected performance gain and resource-utilization efficiency vary depending on the application characteristics. Specifically, hardware profilers are used to analyze the limitations of OpenCL application kernels and to guide the development of FPGA-optimized implementations. In particular, we pursue the more challenging problem of irregular OpenCL applications, which suffer from workload imbalance, unpredictable control flow, and irregular memory-access patterns. Experiments using representative kernels from the graph traversal, combinational logic, and sparse linear algebra application domains show that FPGAspecific optimizations can improve the performance of irregular OpenCL applications by up to 27-fold in comparison to the architecture-agnostic OpenCL code from the OpenDwarfs benchmark suite.
January 20, 2019 by hgpu