Automated GPU Kernel Transformations in Large-Scale Production Stencil Applications

Mohamed Wahib, Naoya Maruyama
RIKEN Advanced Institute for Computational Science, Kobe, Japan
ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC’15), 2015


   author={Wahib, Mohamed and Maruyama, Naoya},

   title={Automated GPU Kernel Transformations in Large-Scale Production Stencil Applications},



Download Download (PDF)   View View   Source Source   



This paper proposes an end-to-end framework for automatically transforming stencil-based CUDA programs to exploit inter-kernel data locality. The CUDA-to-CUDA transformation collectively replaces the user-written kernels by auto-generated kernels optimized for data reuse. The transformation is based on two basic operations, kernel fusion and fission, and relies on a series of automated steps: gathering metadata, generating graphs expressing dependencies and precedency constraints, searching for optimal kernel fissions/fusions, and generation of optimized code. The framework is modeled to provide the flexibility required for accommodating different applications, allowing the programmer to monitor and amend the intermediate results of different phases of the transformation. We demonstrate the practicality and effectiveness of automatic transformations in exploiting exposed data localities using a variety of real-world applications with large codebases that contain dozens of kernels and data arrays. Experimental results show that the proposed end-to-end automated approach, with minimum intervention from the user, improved performance of six applications with speedups ranging between 1.12x to 1.76x.
Rating: 2.5/5. From 1 vote.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: