high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Autotuning for Automatic Parallelization on Heterogeneous Systems

Autotuning for Automatic Parallelization on Heterogeneous Systems

Philip Pfaffe

KIT-Fakultät für Informatik des Karlsruher Instituts für Technologie (KIT)

KIT-Bibiliotek, 2020

DOI:10.5445/IR/1000119646

@article{pfaffe2019autotuning,

title={Autotuning for Automatic Parallelization on Heterogeneous Systems},

author={Pfaffe, Philip},

year={2019}

}

Download (PDF)

View

Source

2326

views

To meet the surging demand for high-speed computation in an era of stagnating increase in performance per processor, systems designers resort to aggregating many and even heterogeneous processors into single systems. Automatic parallelization tools relieve application developers of the tedious and error prone task of programming these heterogeneous systems. For these tools, there are two aspects to maximizing performance: Optimizing the execution on each parallel platform individually, and executing work on the available platforms cooperatively. To date, various approaches exist targeting either aspect. Automatic parallelization for simultaneous cooperative computation with optimized per-platform execution however remains an unsolved problem. This thesis presents the APHES framework to close that gap. The framework combines automatic parallelization with a novel technique for input-sensitive online autotuning. Its first component, a parallelizing polyhedral compiler, transforms implicitly data-parallel program parts for multiple platforms. Targeted platforms then automatically cooperate to process the work. During compilation, the code is instrumented to interact with libtuning, our new autotuner and second component of the framework. Tuning the work distribution and per-platform execution maximizes overall performance. The autotuner enables always-on autotuning through a novel hybrid tuning method, combining a new efficient search technique and model-based prediction. Experiments show that the APHES framework can solve the cooperative heterogeneous parallelization problem and that cooperative execution outperforms versions parallelized for a single platform. On benchmarks from the PolyBench benchmark suite, the APHES-transformed programs achieve a speedup of up to 6x compared to program versions generated by state-of-the-art single-platform parallelizers. The libtuning autotuner reduces the search time by up to 30% compared to stateof- the-art autotuning while still finding competitive configurations. Additionally, model-based prediction is is able to reduce 99% of the search overhead.

Tags: Auto-Tuning, Computer science, CUDA, Heterogeneous systems, nVidia, nVidia GeForce GTX 970, OpenMP, Performance, Tesla P100, Thesis

June 21, 2020 by hgpu

Rating: 4.0/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Autotuning for Automatic Parallelization on Heterogeneous Systems

Your response

Recent source codes

TRUST: a thermalhydraulic software package for CFD simulations

Modular: The Modular Platform (includes MAX & Mojo)

Allo: Accelerator Design Language

Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization

HPC Benchmark Survey

HDM: Home made Diffusion Models

General Matrix Multiplication (GEMM)

CrossTL: Universal Programming Language & Translator

TBD-GPU

DG-SWEM - The Discontinuous Galerkin Shallow Water Equation Model

Most viewed papers (last 30 days)

Autotuning for Automatic Parallelization on Heterogeneous Systems

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)