high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A Code Optimization Framework for Performance Portability of GPU Kernels onto Custom Accelerators

A Code Optimization Framework for Performance Portability of GPU Kernels onto Custom Accelerators

Alexandros Papakonstantinou, Deming Chen, Wen-mei Hwu

Electrical and Computer Engineering Department, University of Illinois at Urbana-Champaign

Univ. of Illinois/Urbana-Champaign, 2011

@article{papakonstantinou2012code,

title={A Code Optimization Framework for Performance Portability of GPU Kernels onto Custom Accelerators},

author={Papakonstantinou, Alexandros and Chen, Deming and Hwu, Wen-mei},

year={2012}

}

Download (PDF)

View

Source

2074

views

The shift toward parallel computing has resulted into a growing interest in computing systems with heterogeneous processing modules. Reconfigurable devices are often employed in such heterogeneous systems due to their low power and parallel processing benefits. An important issue in the programmability of these systems is the need for a single programming interface. Recent works have leveraged parallel programming models in tandem with high-level synthesis (HLS) to facilitate high abstraction parallel programming of FPGAs. Nevertheless, generating efficient custom hardware accelerators depends on the structure of the parallel input code. Code optimized for programmable multicore devices (e.g. GPUs or CPUs) may result in low-performance custom accelerators. In this work the researchers describe a code optimization framework which analyzes and restructures CUDA kernels that were optimized for GPU devices in order to facilitate synthesis of efficient custom accelerators on FPGA. Their experimental results show that the proposed framework can achieve good performance portability.

Tags: Computer science, CUDA, FPGA, Heterogeneous systems, nVidia, Optimization

February 20, 2012 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

A Code Optimization Framework for Performance Portability of GPU Kernels onto Custom Accelerators

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

A Code Optimization Framework for Performance Portability of GPU Kernels onto Custom Accelerators

Share this:

Recent source codes

Most viewed papers (last 30 days)