Multilevel Granularity Parallelism Synthesis on FPGAs

hgpu.org » Applications » Computer science » Multilevel Granularity Parallelism Synthesis on FPGAs

Multilevel Granularity Parallelism Synthesis on FPGAs

Alexandros Papakonstantinou, Yun Liang, John A. Stratton, Karthik Gururaj, Deming Chen, Wen-Mei W. Hwu, Jason Cong

Electrical & Computer Eng. Dept., University of Illinois, Urbana-Champaign, IL, USA

IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2011

@article{papakonstantinoumultilevel,

title={Multilevel Granularity Parallelism Synthesis on FPGAs},

author={Papakonstantinou, A. and Liang, Y. and Stratton, J.A. and Gururaj, K. and Chen, D. and Hwu, W.M.W. and Cong, J.},

booktitle={IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)},

year={2011}

}

Download (PDF)

View

Source

1767

views

Recent progress in High-Level Synthesis (HLS) techniques has helped raise the abstraction level of FPGA programming. However implementation and performance evaluation of the HLS-generated RTL, involves lengthy logic synthesis and physical design flows. Moreover, mapping of different levels of coarse grained parallelism onto hardware spatial parallelism affects the final FPGA-based performance both in terms of cycles and frequency. Evaluation of the rich design space through the full implementation flow – starting with high level source code and ending with routed netlist – is prohibitive in various scientific and computing domains, thus hindering the adoption of reconfigurable computing. This work presents a framework for multilevel granularity parallelism exploration with HLS-order of efficiency. Our framework considers different granularities of parallelism for mapping CUDA kernels onto high performance FPGA-based accelerators. We leverage resource and clock period models to estimate the impact of multi-granularity parallelism extraction on execution cycles and frequency. The proposed Multilevel Granularity Parallelism Synthesis (ML-GPS) framework employs an efficient design space search heuristic in tandem with the estimation models as well as design layout information to derive a performance near-optimal configuration. Our experimental results demonstrate that ML-GPS can efficiently identify and generate CUDA kernel configurations that can significantly outperform previous related tools whereas it can offer competitive performance compared to software kernel execution on GPUs at a fraction of the energy cost.

Tags: Code generation, Compilers, Computer science, CUDA, FPGA, nVidia, nVidia GeForce 9800 GX2

May 21, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org