GRATER: An Approximation Workflow for Exploiting Data-Level Parallelism in FPGA Acceleration

hgpu.org » Applications » Computer science » GRATER: An Approximation Workflow for Exploiting Data-Level Parallelism in FPGA Acceleration

GRATER: An Approximation Workflow for Exploiting Data-Level Parallelism in FPGA Acceleration

Atieh Lotfi, Abbas Rahimi, Amir Yazdanbakhsh, Hadi Esmaeilzadeh, Rajesh K. Gupta

UC San Diego

19th Design Automation and Test in Europe (DATE), 2016

BibTeX

Download (PDF)

View

Source

2460

views

Modern applications including graphics, multimedia, web search, and data analytics not only can benefit from acceleration, but also exhibit significant degrees of tolerance to imprecise computation. This amenability to approximation provides an opportunity to trade quality of the results for higher performance and better resource utilization. Exploiting this opportunity is particularly important for FPGA accelerators that are inherently subject to many resource constraints. To better utilize the FPGA resources, we devise, GRATER, an automated design workflow for FPGA accelerators that leverages imprecise computation to increase data-level parallelism and achieve higher computational throughput. The core of our workflow is a source-to-source compiler that takes in an input kernel and applies a novel optimization technique that selectively reduces the precision of kernel’s data and operations. By selectively reducing the precision of the data and operation, the required area to synthesize the kernels on the FPGA decreases allowing to integrate a larger number of operations and parallel kernels in the fixed area of the FPGA. The larger number of integrated kernels provides more hardware context to better exploit datalevel parallelism in the target applications. To effectively explore the possible design space of approximate kernels, we exploit a genetic algorithm to find a subset of safe-to-approximate operations and data elements and then tune their precision levels until the desired output quality is achieved. GRATER exploits a fully software technique and does not require any changes to the underlying FPGA hardware. We evaluate GRATER on a diverse set of data-intensive OpenCL benchmarks from the AMD SDK. The synthesis result on a modern Altera FPGA shows that our approximation workflow yields 1.4x-3.0x higher throughput with less than 1% quality loss.

Tags: ATI, ATI Radeon HD 5870, Benchmarking, Code generation, Computer science, FPGA, OpenCL

December 12, 2015 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org