high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » High-performance CUDA kernel execution on FPGAs

High-performance CUDA kernel execution on FPGAs

Alexandros Papakonstantinou, Karthik Gururaj, John A. Stratton, Deming Chen, Jason Cong, Wen Mei

Electrical & Computer Engineering Dept., University of Illinois, Urbana-Champaign, IL, USA

In ICS ’09: Proceedings of the 23rd international conference on Supercomputing (2009), pp. 515-516.

DOI:10.1145/1542275.1542357

BibTeX

Download (PDF)

View

Source

1754

views

In this work, we propose a new FPGA design flow that combines the CUDA programming model from Nvidia with the state of the art high-level synthesis tool AutoPilot from AutoESL, to efficiently map the exposed parallelism in CUDA kernels onto reconfigurable devices. The use of the CUDA programming model offers the advantage of a common programming interface for exploiting parallelism on two very different types of accelerators — FPGAs and GPUs. Moreover, by leveraging the advanced synthesis capabilities of AutoPilot we enable efficient exploitation of the FPGA configurability for application specific acceleration. Our flow is based on a compilation process that transforms the SPMD CUDA thread blocks into high-concurrency AutoPilot-C code. We provide an overview of our CUDA-to-FPGA flow and demonstrate the highly competitive performance of the generated multi-core accelerators.

Tags: Computer science, CUDA, FPGA, Performance, Programming techniques

January 11, 2011 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

High-performance CUDA kernel execution on FPGAs

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

High-performance CUDA kernel execution on FPGAs

Share this:

Recent source codes

Most viewed papers (last 30 days)