A Code Optimization Framework for Performance Portability of GPU Kernels onto Custom Accelerators
Electrical and Computer Engineering Department, University of Illinois at Urbana-Champaign
Univ. of Illinois/Urbana-Champaign, 2011
@article{papakonstantinou2012code,
title={A Code Optimization Framework for Performance Portability of GPU Kernels onto Custom Accelerators},
author={Papakonstantinou, Alexandros and Chen, Deming and Hwu, Wen-mei},
year={2012}
}
The shift toward parallel computing has resulted into a growing interest in computing systems with heterogeneous processing modules. Reconfigurable devices are often employed in such heterogeneous systems due to their low power and parallel processing benefits. An important issue in the programmability of these systems is the need for a single programming interface. Recent works have leveraged parallel programming models in tandem with high-level synthesis (HLS) to facilitate high abstraction parallel programming of FPGAs. Nevertheless, generating efficient custom hardware accelerators depends on the structure of the parallel input code. Code optimized for programmable multicore devices (e.g. GPUs or CPUs) may result in low-performance custom accelerators. In this work the researchers describe a code optimization framework which analyzes and restructures CUDA kernels that were optimized for GPU devices in order to facilitate synthesis of efficient custom accelerators on FPGA. Their experimental results show that the proposed framework can achieve good performance portability.
February 20, 2012 by hgpu