Policy-based Tuning for Performance Portability and Library Co-optimization

Duane Merrill, Michael Garland, Andrew Grimshaw
NVIDIA Corporation, Santa Clara, California, USA
Innovative Parallel Computing (InPar 2012), 2012


   title={Policy-based Tuning for Performance Portability and Library Co-optimization},

   author={Duane Merrill and Michael Garland and Andrew Grimshaw},

   booktitle={Proc. Innovative Parallel Computing (InPar 2012)},




Download Download (PDF)   View View   Source Source   



Although modular programming is a fundamental software development practice, software reuse within contemporary GPU kernels is uncommon. For GPU software assets to be reusable across problem instances, they must be inherently flexible and tunable. To illustrate, we survey the performance-portability landscape for a suite of common GPU primitives, evaluating thousands of reasonable program variants across a large diversity of problem instances (microarchitecture, problem size, and data type). While individual specializations provide excellent performance for specific instances, we find no variants with "universally reasonable" performance. In this paper, we present a policy-based design idiom for constructing reusable, tunable software components that can be co-optimized with the enclosing kernel for the specific problem and processor at hand. In particular, this approach enables flexible granularity coarsening which allows the expensive aspects of communication and the redundant aspects of data parallelism to scale with the width of the processor rather than the problem size. From a small library of tunable device subroutines, we have constructed the fastest, most versatile GPU primitives for reduction, prefix and segmented scan, duplicate removal, reduction-by-key, sorting, and sparse graph traversal.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: