high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Automatic transformation and optimization of applications on GPUs and GPU clusters

Automatic transformation and optimization of applications on GPUs and GPU clusters

Wenjing Ma

Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210

The Ohio State University, 2011

@phdthesis{ma2011automatic,

title={Automatic transformation and optimization of applications on GPUs and GPU clusters},

author={Ma, W.},

year={2011},

school={THE OHIO STATE UNIVERSITY}

}

Download (PDF)

View

Source

1903

views

Modern accelerators and multi-core architectures offer significant computing power at a very modest cost. With this trend, an important research issue at the software end is how to make the best use of these computing devices, and how to enable high performance without the users having to put too much effort into learning the architecture and the programming model. Our goal is to address the above problem by developing automatic code generation systems, particularly for GPUs and GPU clusters. We believe that by focusing on specific application classes, the task of automatic code generation can be significantly simplified. Thus, we made efforts in providing code generation and optimization systems for two classes of applications: data-intensive applications with generalized reductions, and tensor contraction functions. First, we focused on a class of data-intensive applications, whose processing structure is of generalized reductions. In the code generation systems we have built, the user input are algorithms written in high-level languages, specifically, C or MATLAB. Program analysis and code generation is performed to generate code for a single GPU, or a GPU cluster. The three specific systems we have built are GREENRIDE, a code generation system to provide GPU support for C programs; GMAT-DM, which translates MATLAB code into GPU executable program; and AUTO-GC, which provides GPU support for clusters, by incorporating code generation for FREERIDE, which is a middleware supporting parallel execution for data mining. For tensor contractions, we evaluated the automatically generated code on different GPUs, and made investigation in the algorithm optimization for each card. It led to an auto-tuning framework which selects algorithms and parameters according to some cost model and thresholds extracted from simple micro-benchmarks. We also developed a loop transformation system in the environment of multi-level memory hierarchy. By focusing on the dominating factors of the computation, we were able to remove a large portion of extra data movement between memory hierarchies. In future, we plan to extend our work in the following directions. The code generation system for data intensive applications with reduction patterns could be applied and optimized for other classes of applications. The integer programming model could also be used for other architectures, including future accelerators. We would like to consider heterogeneous systems for the loop transformation approach. The auto-tuning framework will be extended to include more parameters, enabling better performance gain.

Tags: Algorithm optimization, Algorithms, Benchmarking, Code generation, Computer science, CUDA, Data mining, GPU cluster, Heterogeneous systems, nVidia, nVidia GeForce 8800 GTX, nVidia GeForce 9800 GX2, Optimization, Tesla T10, Thesis

November 9, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Automatic transformation and optimization of applications on GPUs and GPU clusters

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

Automatic transformation and optimization of applications on GPUs and GPU clusters

Share this:

Recent source codes

Most viewed papers (last 30 days)