1163

Fast tridiagonal solvers on the GPU

Yao Zhang, Jonathan Cohen, John D. Owens
University of California, Davis
In PPoPP ’10: Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of parallel programming (2010), pp. 127-136.

@conference{zhang2010fast,

   title={Fast tridiagonal solvers on the GPU},

   author={Zhang, Y. and Cohen, J. and Owens, J.D.},

   booktitle={Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of parallel computing},

   pages={127–136},

   year={2010},

   organization={ACM}

}

Download Download (PDF)   View View   Source Source   

2193

views

We study the performance of three parallel algorithms and their hybrid variants for solving tridiagonal linear systems on a GPU: cyclic reduction (CR), parallel cyclic reduction (PCR) and recursive doubling (RD). We develop an approach to measure, analyze, and optimize the performance of GPU programs in terms of memory access, computation, and control overhead. We find that CR enjoys linear algorithm complexity but suffers from more algorithmic steps and bank conflicts, while PCR and RD have fewer algorithmic steps but do more work each step. To combine the benefits of the basic algorithms, we propose hybrid CR+PCR and CR+RD algorithms, which improve the performance of PCR, RD and CR by 21%, 31% and 61% respectively. Our GPU solvers achieve up to a 28x speedup over a sequential LAPACK solver, and a 12x speedup over a multi-threaded CPU solver.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: