An Auto-tuned Method for Solving Large Tridiagonal Systems on the GPU
University of California, Davis
IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2011
@inproceedings{davidson2011auto,
title={An auto-tuned method for solving large tridiagonal systems on the GPU},
author={Davidson, A. and Zhang, Y. and Owens, J.D.},
booktitle={Parallel & Distributed Processing Symposium (IPDPS), 2011 IEEE International},
pages={956–965},
year={2011},
organization={IEEE}
}
We present a multi-stage method for solving large tridiagonal systems on the GPU. Previously large tridiagonal systems cannot be efficiently solved due to the limitation of on-chip shared memory size. We tackle this problem by splitting the systems into smaller ones and then solving them on-chip. The multi-stage characteristic of our method, together with various workloads and GPUs of different capabilities, obligates an auto-tuning strategy to carefully select the switch points between computation stages. In particular, we show two ways to effectively prune the tuning space and thus avoid an impractical exhaustive search: (1) apply algorithmic knowledge to decouple tuning parameters, and (2) estimate search starting points based on GPU architecture parameters. We demonstrate that auto-tuning is a powerful tool that improves the performance by up to 5x, saves 17% and 32% of execution time on average respectively over static and dynamic tuning, and enables our multi-stage solver to outperform the Intel MKL tridiagonal solver on many parallel tridiagonal systems by 6-11x.
October 16, 2011 by hgpu