Multilevel Granularity Parallelism Synthesis on FPGAs
Electrical & Computer Eng. Dept., University of Illinois, Urbana-Champaign, IL, USA
IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2011
@article{papakonstantinoumultilevel,
title={Multilevel Granularity Parallelism Synthesis on FPGAs},
author={Papakonstantinou, A. and Liang, Y. and Stratton, J.A. and Gururaj, K. and Chen, D. and Hwu, W.M.W. and Cong, J.},
booktitle={IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)},
year={2011}
}
Recent progress in High-Level Synthesis (HLS) techniques has helped raise the abstraction level of FPGA programming. However implementation and performance evaluation of the HLS-generated RTL, involves lengthy logic synthesis and physical design flows. Moreover, mapping of different levels of coarse grained parallelism onto hardware spatial parallelism affects the final FPGA-based performance both in terms of cycles and frequency. Evaluation of the rich design space through the full implementation flow – starting with high level source code and ending with routed netlist – is prohibitive in various scientific and computing domains, thus hindering the adoption of reconfigurable computing. This work presents a framework for multilevel granularity parallelism exploration with HLS-order of efficiency. Our framework considers different granularities of parallelism for mapping CUDA kernels onto high performance FPGA-based accelerators. We leverage resource and clock period models to estimate the impact of multi-granularity parallelism extraction on execution cycles and frequency. The proposed Multilevel Granularity Parallelism Synthesis (ML-GPS) framework employs an efficient design space search heuristic in tandem with the estimation models as well as design layout information to derive a performance near-optimal configuration. Our experimental results demonstrate that ML-GPS can efficiently identify and generate CUDA kernel configurations that can significantly outperform previous related tools whereas it can offer competitive performance compared to software kernel execution on GPUs at a fraction of the energy cost.
May 21, 2011 by hgpu