high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » FCUDA: Enabling Efficient Compilation of CUDA Kernels onto FPGAs

FCUDA: Enabling Efficient Compilation of CUDA Kernels onto FPGAs

Ros Papakonstantinou, Karthik Gururaj, John A. Stratton, Deming Chen, Jason Cong, Wen-mei W. Hwu

Electrical & Computer Eng. Dept., University of Illinois, Urbana-Champaign, IL, USA

IEEE 7th Symposium on Application Specific Processors, 2009. SASP ’09

DOI:10.1109/SASP.2009.5226333

@conference{papakonstantinou2009fcuda,

title={FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs},

author={Papakonstantinou, A. and Gururaj, K. and Stratton, J.A. and Chen, D. and Cong, J. and Hwu, W.M.W.},

booktitle={Application Specific Processors, 2009. SASP’09. IEEE 7th Symposium on},

pages={35–42},

year={2009},

organization={IEEE}

}

Download (PDF)

View

Source

1531

views

As growing power dissipation and thermal effects disrupted the rising clock frequency trend and threatened to annul Moore’s law, the computing industry has switched its route to higher performance through parallel processing. The rise of multi-core systems in all domains of computing has opened the door to heterogeneous multi-processors, where processors of different compute characteristics can be combined to effectively boost the performance per watt of different application kernels. GPUs and FPGAs are becoming very popular in PC-based heterogeneous systems for speeding up compute intensive kernels of scientific, imaging and simulation applications. GPUs can execute hundreds of concurrent threads, while FPGAs provide customized concurrency for highly parallel kernels. However, exploiting the parallelism available in these applications is often not a push-button task. Often the programmer has to expose the application’s fine and coarse grained parallelism by using special APIs. CUDA is such a parallel-computing API that is driven by the GPGPU industry and is gaining significant popularity. In this work, we adapt the CUDA programming model into a new FPGA design flow called FCUDA, which efficiently maps the coarse and fine grained parallelism exposed in CUDA onto the reconfigurable fabric. Our CUDA-to-FPGA flow employs AutoPilot, an advanced high-level synthesis tool which enables high-abstraction FPGA programming. FCUDA is based on a source-to-source compilation that transforms the SPMD CUDA thread blocks into parallel C code for AutoPilot. We describe the details of our CUDA-to-FPGA flow and demonstrate the highly competitive performance of the resulting customized FPGA multi-core accelerators. To the best of our knowledge, this is the first CUDA-to-FPGA flow to demonstrate the applicability and potential advantage of using the CUDA programming model for high-performance computing in FPGAs. I.

Tags: Compilers, Computer science, CUDA, FPGA, High-level Languages

January 11, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

FCUDA: Enabling Efficient Compilation of CUDA Kernels onto FPGAs

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

FCUDA: Enabling Efficient Compilation of CUDA Kernels onto FPGAs

Share this:

Recent source codes

Most viewed papers (last 30 days)