high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A Code Optimization Framework for Performance Portability of GPU Kernels onto Custom Accelerators

A Code Optimization Framework for Performance Portability of GPU Kernels onto Custom Accelerators

Alexandros Papakonstantinou, Deming Chen, Wen-mei Hwu

Electrical and Computer Engineering Department, University of Illinois at Urbana-Champaign

Univ. of Illinois/Urbana-Champaign, 2011

BibTeX

Download (PDF)

View

Source

2483

views

The shift toward parallel computing has resulted into a growing interest in computing systems with heterogeneous processing modules. Reconfigurable devices are often employed in such heterogeneous systems due to their low power and parallel processing benefits. An important issue in the programmability of these systems is the need for a single programming interface. Recent works have leveraged parallel programming models in tandem with high-level synthesis (HLS) to facilitate high abstraction parallel programming of FPGAs. Nevertheless, generating efficient custom hardware accelerators depends on the structure of the parallel input code. Code optimized for programmable multicore devices (e.g. GPUs or CPUs) may result in low-performance custom accelerators. In this work the researchers describe a code optimization framework which analyzes and restructures CUDA kernels that were optimized for GPU devices in order to facilitate synthesis of efficient custom accelerators on FPGA. Their experimental results show that the proposed framework can achieve good performance portability.

Tags: Computer science, CUDA, FPGA, Heterogeneous systems, nVidia, Optimization

February 20, 2012 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

A Code Optimization Framework for Performance Portability of GPU Kernels onto Custom Accelerators

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

A Code Optimization Framework for Performance Portability of GPU Kernels onto Custom Accelerators

Share this:

Recent source codes

Most viewed papers (last 30 days)