On Optimizing Complex Stencils on GPUs

Prashant Singh Rawat, Miheer Vaidya, Aravind Sukumaran-Rajam, Atanas Rountev, Louis-Noël Pouchet, P. Sadayappan
The Ohio State University, USA
IEEE Parallel & Distributed Processing Symposium (IPDPS’19), 2019


   title={On Optimizing Complex Stencils on GPUs},

   author={Rawat, Prashant Singh and Vaidya, Miheer and Sukumaran-Rajam, Aravind and Rountev, Atanas and Pouchet, Louis-No{"e}l and Sadayappan, P},



Download Download (PDF)   View View   Source Source   



Stencil computations are often the computeintensive kernel in many scientific applications. With the increasing demand for computational accuracy, and the emergence of massively data-parallel high-bandwidth architectures like GPUs, stencils have steadily become more complex in terms of the stencil order, data accesses, and reuse patterns. Many prior efforts have focused on optimizing simpler stencil computations on various platforms. However, existing stencil code generators face challenges in optimizing such complex multi-statement stencil DAGs. This paper addresses the challenges in optimizing high-order stencil DAGs on GPUs by focusing on two key considerations: (1) enabling the domain expert to guide the code optimization, which may otherwise be extremely challenging for complex stencils; and (2) using bottleneck analysis via runtime profiling to guide the application of optimizations, and the tuning of various code generation parameters. We implement these abstractions in a prototype code generation framework termed ARTEMIS, and evaluate its efficacy over multiple stencil kernels with varying complexity and operational intensity on an NVIDIA P100 GPU.
Rating: 5.0/5. From 1 vote.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: