An auto-tuning framework for parallel multicore stencil computations

Shoaib Kamil, Cy Chan, Leonid Oliker, John Shalf, Samuel Williams
CRD, Lawrence Berkeley National Laboratory Berkeley, Berkeley, CA, USA
IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010


   title={An auto-tuning framework for parallel multicore stencil computations},

   author={Kamil, S. and Chan, C. and Oliker, L. and Shalf, J. and Williams, S.},

   booktitle={Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on},





Download Download (PDF)   View View   Source Source   



Although stencil auto-tuning has shown tremendous potential in effectively utilizing architectural resources, it has hitherto been limited to single kernel instantiations; in addition, the large variety of stencil kernels used in practice makes this computation pattern difficult to assemble into a library. This work presents a stencil auto-tuning framework that significantly advances programmer productivity by automatically converting a straightforward sequential Fortran 95 stencil expression into tuned parallel implementations in Fortran, C, or CUDA, thus allowing performance portability across diverse computer architectures, including the AMD Barcelona, Intel Nehalem, Sun Victoria Falls, and the latest NVIDIA GPUs. Results show that our generalized methodology delivers significant performance gains of up to 22x speedup over the reference serial implementation. Overall we demonstrate that such domain-specific auto-tuners hold enormous promise for architectural efficiency, programmer productivity, performance portability, and algorithmic adaptability on existing and emerging multicore systems.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: