GRATER: An Approximation Workflow for Exploiting Data-Level Parallelism in FPGA Acceleration
UC San Diego
19th Design Automation and Test in Europe (DATE), 2016
@article{lotfi2016grater,
title={GRATER: An Approximation Workflow for Exploiting Data-Level Parallelism in FPGA Acceleration},
author={Lotfi, Atieh and Rahimi, Abbas and Yazdanbakhsh, Amir and Esmaeilzadeh, Hadi and Gupta, Rajesh K},
year={2016}
}
Modern applications including graphics, multimedia, web search, and data analytics not only can benefit from acceleration, but also exhibit significant degrees of tolerance to imprecise computation. This amenability to approximation provides an opportunity to trade quality of the results for higher performance and better resource utilization. Exploiting this opportunity is particularly important for FPGA accelerators that are inherently subject to many resource constraints. To better utilize the FPGA resources, we devise, GRATER, an automated design workflow for FPGA accelerators that leverages imprecise computation to increase data-level parallelism and achieve higher computational throughput. The core of our workflow is a source-to-source compiler that takes in an input kernel and applies a novel optimization technique that selectively reduces the precision of kernel’s data and operations. By selectively reducing the precision of the data and operation, the required area to synthesize the kernels on the FPGA decreases allowing to integrate a larger number of operations and parallel kernels in the fixed area of the FPGA. The larger number of integrated kernels provides more hardware context to better exploit datalevel parallelism in the target applications. To effectively explore the possible design space of approximate kernels, we exploit a genetic algorithm to find a subset of safe-to-approximate operations and data elements and then tune their precision levels until the desired output quality is achieved. GRATER exploits a fully software technique and does not require any changes to the underlying FPGA hardware. We evaluate GRATER on a diverse set of data-intensive OpenCL benchmarks from the AMD SDK. The synthesis result on a modern Altera FPGA shows that our approximation workflow yields 1.4x-3.0x higher throughput with less than 1% quality loss.
December 12, 2015 by hgpu