Rank k Cholesky Up/Down-dating on the GPU: gpucholmodV0.2
Informatics and Mathematical Modelling, Technical University of Denmark, DK-2800
arXiv:1011.1173 [cs.DC] (4 Nov 2010)
@article{2010arXiv1011.1173W,
author={Walder}, C.},
title={“{Rank k Cholesky Up/Down-dating on the GPU: gpucholmodV0.2}”},
journal={ArXiv e-prints},
archivePrefix={“arXiv”},
eprint={1011.1173},
primaryClass={“cs.DC”},
keywords={Computer Science – Distributed, Parallel, and Cluster Computing, G.1.0},
year={2010},
month={nov},
adsurl={http://adsabs.harvard.edu/abs/2010arXiv1011.1173W},
adsnote={Provided by the SAO/NASA Astrophysics Data System}
}
In this note we briefly describe our Cholesky modification algorithm for streaming multiprocessor architectures. Our implementation is available in C++ with Matlab binding, using CUDA to utilise the graphics processing unit (GPU). Limited speed ups are possible due to the bandwidth bound nature of the problem. Furthermore, a complex dependency pattern must be obeyed, requiring multiple kernels to be launched. Nonetheless, this makes for an interesting problem, and our approach can reduce the computation time by a factor of around 7 for matrices of size 5000 by 5000 and k=16, in comparison with the LINPACK suite running on a CPU of comparable vintage. Much larger problems can be handled however due to the O(n) scaling in required GPU memory of our method.
November 9, 2010 by hgpu