high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A Code Optimization Framework for Performance Portability of GPU Kernels onto Custom Accelerators

A Code Optimization Framework for Performance Portability of GPU Kernels onto Custom Accelerators

Alexandros Papakonstantinou, Deming Chen, Wen-mei Hwu

Electrical and Computer Engineering Department, University of Illinois at Urbana-Champaign

Univ. of Illinois/Urbana-Champaign, 2011

BibTeX

Download (PDF)

View

Source

2495

views

The shift toward parallel computing has resulted into a growing interest in computing systems with heterogeneous processing modules. Reconfigurable devices are often employed in such heterogeneous systems due to their low power and parallel processing benefits. An important issue in the programmability of these systems is the need for a single programming interface. Recent works have leveraged parallel programming models in tandem with high-level synthesis (HLS) to facilitate high abstraction parallel programming of FPGAs. Nevertheless, generating efficient custom hardware accelerators depends on the structure of the parallel input code. Code optimized for programmable multicore devices (e.g. GPUs or CPUs) may result in low-performance custom accelerators. In this work the researchers describe a code optimization framework which analyzes and restructures CUDA kernels that were optimized for GPU devices in order to facilitate synthesis of efficient custom accelerators on FPGA. Their experimental results show that the proposed framework can achieve good performance portability.

Tags: Computer science, CUDA, FPGA, Heterogeneous systems, nVidia, Optimization

February 20, 2012 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

A Code Optimization Framework for Performance Portability of GPU Kernels onto Custom Accelerators

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

A Code Optimization Framework for Performance Portability of GPU Kernels onto Custom Accelerators

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)