high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Automated Partitioning of Data-Parallel Kernels using Polyhedral Compilation

Automated Partitioning of Data-Parallel Kernels using Polyhedral Compilation

Alexander Matz, Johannes Doerfert, Holger Fröning

IMC Trading B.V., Amsterdam, Netherlands

13th International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2), 2020

@article{matz2020automated,

title={Automated Partitioning of Data-Parallel Kernels using Polyhedral Compilation},

author={Matz, Alexander and Doerfert, Johannes and Fr{"o}ning, Holger},

year={2020}

}

Download (PDF)

View

Source

Source codes

Package:

GPU Mekong

1711

views

GPUs are well-established in domains outside of computer graphics, including scientific computing, artificial intelligence, data warehousing, and other computationally intensive areas. Their execution model is based on a thread hierarchy and suggests that GPU workloads can generally be safely partitioned along the boundaries of thread blocks. However, the most efficient partitioning strategy is highly dependent on the application’s memory access patterns, and usually a tedious task for programmers in terms of decision and implementation. We leverage this observation for a concept that automatically compiles single-GPU code to multi-GPU applications. We present the idea and a prototype implementation of this concept and validate both on a selection of benchmarks. In particular, we illustrate our use of 1) polyhedral compilation to model memory accesses, 2) a runtime library to track GPU buffers and identify stale data, 3) IR transformations for the partitioning of GPU kernels, and 4) a custom preprocessor that rewrites CUDA host code to utilize multiple GPUs. This work focuses on applications with regular access patterns on global memory and the toolchain to fully automatically compile CUDA applications without requiring any user intervention. Our benchmarks compare single-device CUDA binaries produced by NVIDIA’s reference compiler to binaries produced for multiple GPUs using our toolchain. We report speedups of up to 12.4x for 16 Kepler-class GPUs.

Tags: Code generation, Computer science, CUDA, LLVM, nVidia, Package, Tesla K80

July 12, 2020 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org

Automated Partitioning of Data-Parallel Kernels using Polyhedral Compilation

Package:

Your response

Recent source codes

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

VibeCodeHPC - Multi Agentic Vibe Coding for HPC

Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework

exa-AMD: Exascale Accelerated Materials Discovery

TRUST: a thermalhydraulic software package for CFD simulations

Modular: The Modular Platform (includes MAX & Mojo)

Allo: Accelerator Design Language

Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization

Most viewed papers (last 30 days)

Automated Partitioning of Data-Parallel Kernels using Polyhedral Compilation

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)