Maximizing Parallelism and GPU Utilization For Direct GPU Compilation Through Ensemble Execution
Stony Brook University, Stony Brook, NY, USA
Workshop on LLVM in Parallel Processing (LLPP), 2023
@article{tian2023maximizing,
title={Maximizing Parallelism and GPU Utilization For Direct GPU Compilation Through Ensemble Execution},
author={Tian, Shilei and Chapman, Barbara and Doerfert, Johannes},
year={2023}
}
GPUs are renowned for their exceptional computational acceleration capabilities achieved through massive parallelism. However, utilizing GPUs for computation requires manual identification of code regions suitable for offloading, data transfer management, and synchronization. Recent advancements have capitalized on the LLVM/OpenMP portable target offloading interface, elevating GPU acceleration to new heights. This approach, known as the direct GPU compilation, involves compiling the entire host application for execution on the GPU, eliminating the need for explicit offloading directives. However, direct GPU compilation is limited to the thread parallelism a CPU application exposes, which is often not enough to saturate a modern GPU. This paper explores an alternative approach to enhance parallelism by enabling ensemble execution. We introduce a proof-of-concept implementation that maps each invocation of an application on a different input to an individual team executed by the same GPU kernel. Our enhanced GPU loader can read command line arguments for different instances from a file to simplify the usability. Through extensive evaluation using four benchmarks, we observe up to 51X speedup for 64 instances. This demonstrate the effectiveness of ensemble execution in improving parallelism and optimizing GPU utilization for CPU programs compiled and executed directly on the GPU.
July 24, 2023 by hgpu