high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Parametric GPU Code Generation for Affine Loop Programs

Parametric GPU Code Generation for Affine Loop Programs

Athanasios Konstantinidis, Paul H. J. Kelly, J. Ramanujam, P. Sadayappan

Imperial College London

The 26th International Workshop on Languages and Compilers for Parallel Computing (LCPC), 2013

@article{konstantinidis2013parametric,

title={Parametric GPU Code Generation for Affine Loop Programs},

author={Konstantinidis, Athanasios and Kelly, Paul HJ and Ramanujam, J and Sadayappan, P},

year={2013}

}

Download (PDF)

View

Source

Source codes

Package:

pTileGPU

2275

views

Partitioning a parallel computation into finitely sized chunks for effective mapping onto a parallel machine is a critical concern for source-to-source compilation. In the context of OpenCL and CUDA, this translates to the definition of a uniform hyper-rectangular partitioning of the parallel execution space where each partition is subject to a fine-grained distribution of resources that has a direct yet hard to estimate impact on performance. This paper develops the first compilation scheme for generating parametrically tiled codes for affine loop programs on GPUs which facilitates run-time exploration of partitioning parameters as a fast and portable way of finding the ones that yield maximum performance. Our approach is based on a parametric tiling scheme for producing wavefronts of parallel rectangular partitions of parametric size and a novel runtime system that manages wavefront execution and local memory usage dynamically through an inspector-executor mechanism. Our experimental evaluation demonstrates the effectiveness of our approach for wavefront as well as rectangularly-parallel partitionings.

Tags: Code generation, Computer science, CUDA, nVidia, nVidia GeForce GT 540 M, nVidia GeForce GTX 580, OpenCL, Package, Tesla K20, Tesla M2070

October 5, 2013 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Parametric GPU Code Generation for Affine Loop Programs

Package:

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

Parametric GPU Code Generation for Affine Loop Programs

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)