## Analysis of A Splitting Approach for the Parallel Solution of Linear Systems on GPU Cards

Electrical and Computer Engineering, University of Wisconsin-Madison, Madison, WI 53706

arXiv:1509.07919 [cs.DC], (25 Sep 2015)

@article{li2015analysis,

title={Analysis of A Splitting Approach for the Parallel Solution of Linear Systems on GPU Cards},

author={Li, Ang and Serban, Radu and Negrut, Dan},

year={2015},

month={sep},

archivePrefix={"arXiv"},

primaryClass={cs.DC}

}

We discuss an approach for solving sparse or dense banded linear systems ${bf A} {bf x} = {bf b}$ on a Graphics Processing Unit (GPU) card. The matrix ${bf A} in {mathbb{R}}^{N times N}$ is possibly nonsymmetric and moderately large; i.e., $10000 leq N leq 500000$. The ${it split and parallelize}$ (${tt SaP}$) approach seeks to partition the matrix ${bf A}$ into diagonal sub-blocks ${bf A}_i$, $i=1,ldots,P$, which are independently factored in parallel. The solution may choose to consider or to ignore the matrices that couple the diagonal sub-blocks ${bf A}_i$. This approach, along with the Krylov subspace-based iterative method that it preconditions, are implemented in a solver called ${tt SaP::GPU}$, which is compared in terms of efficiency with three commonly used sparse direct solvers: ${tt PARDISO}$, ${tt SuperLU}$, and ${tt MUMPS}$. ${tt SaP::GPU}$, which runs entirely on the GPU except several stages involved in preliminary row-column permutations, is robust and compares well in terms of efficiency with the aforementioned direct solvers. In a comparison against Intel’s ${tt MKL}$, ${tt SaP::GPU}$ also fares well when used to solve dense banded systems that are close to being diagonally dominant. ${tt SaP::GPU}$ is publicly available and distributed as open source under a permissive BSD3 license.

September 30, 2015 by hgpu