Autotuning Stencil-Based Computations on GPUs
Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439
Preprint ANL/MCS-P2094-0512, 2012
@article{mametjanov2012autotuning,
title={Autotuning Stencil-Based Computations on GPUs},
author={Mametjanov, A. and Lowell, D. and Ma, C.C. and Norris, B.},
year={2012}
}
Finite-difference, stencil-based discretization approaches are widely used in the solution of partial differential equations describing physical phenomena. Newton-Krylov iterative methods commonly used in stencil-based solutions generate matrices that exhibit diagonal sparsity patterns. To exploit these structures on modern GPUs, we extend the standard diagonal sparse matrix representation and define new matrix and vector data types in the PETSc parallel numerical toolkit. We create tunable CUDA implementations of the operations associated with these types after identifying a number of GPU-specific optimizations and tuning parameters for these operations. We discuss our implementation of GPU autotuning capabilities in the Orio framework and present performance results for several kernels, comparing them with vendor-tuned library implementations.
June 9, 2012 by hgpu