22164

Automated Partitioning of Data-Parallel Kernels using Polyhedral Compilation

Alexander Matz, Johannes Doerfert, Holger Fröning
IMC Trading B.V., Amsterdam, Netherlands
13th International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2), 2020

@article{matz2020automated,

   title={Automated Partitioning of Data-Parallel Kernels using Polyhedral Compilation},

   author={Matz, Alexander and Doerfert, Johannes and Fr{"o}ning, Holger},

   year={2020}

}

Download Download (PDF)   View View   Source Source   Source codes Source codes

Package:

1429

views

GPUs are well-established in domains outside of computer graphics, including scientific computing, artificial intelligence, data warehousing, and other computationally intensive areas. Their execution model is based on a thread hierarchy and suggests that GPU workloads can generally be safely partitioned along the boundaries of thread blocks. However, the most efficient partitioning strategy is highly dependent on the application’s memory access patterns, and usually a tedious task for programmers in terms of decision and implementation. We leverage this observation for a concept that automatically compiles single-GPU code to multi-GPU applications. We present the idea and a prototype implementation of this concept and validate both on a selection of benchmarks. In particular, we illustrate our use of 1) polyhedral compilation to model memory accesses, 2) a runtime library to track GPU buffers and identify stale data, 3) IR transformations for the partitioning of GPU kernels, and 4) a custom preprocessor that rewrites CUDA host code to utilize multiple GPUs. This work focuses on applications with regular access patterns on global memory and the toolchain to fully automatically compile CUDA applications without requiring any user intervention. Our benchmarks compare single-device CUDA binaries produced by NVIDIA’s reference compiler to binaries produced for multiple GPUs using our toolchain. We report speedups of up to 12.4x for 16 Kepler-class GPUs.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: