Programming Heterogeneous Systems from an Image Processing DSL
Stanford University
arXiv:1610.09405 [cs.SE], (28 Oct 2016)
@article{pu2016programming,
title={Programming Heterogeneous Systems from an Image Processing DSL},
author={Pu, Jing and Bell, Steven and Yang, Xuan and Setter, Jeff and Richardson, Stephen and Ragan-Kelley, Jonathan and Horowitz, Mark},
year={2016},
month={oct},
archivePrefix={"arXiv"},
primaryClass={cs.SE}
}
Specialized image processing accelerators are necessary to deliver the performance and energy efficiency required by important applications in computer vision, computational photography, and augmented reality. But creating, "programming,"and integrating this hardware into a hardware/software system is difficult. We address this problem by extending the image processing language, Halide, so users can specify which portions of their applications should become hardware accelerators, and then we provide a compiler that uses this code to automatically create the accelerator along with the "glue" code needed for the user’s application to access this hardware. Starting with Halide not only provides a very high-level functional description of the hardware, but also allows our compiler to generate the complete software program including the sequential part of the workload, which accesses the hardware for acceleration. Our system also provides high-level semantics to explore different mappings of applications to a heterogeneous system, with the added flexibility of being able to map at various throughput rates. We demonstrate our approach by mapping applications to a Xilinx Zynq system. Using its FPGA with two low-power ARM cores, our design achieves up to 6x higher performance and 8x lower energy compared to the quad-core ARM CPU on an NVIDIA Tegra K1, and 3.5x higher performance with 12x lower energy compared to the K1’s 192-core GPU.
November 1, 2016 by hgpu