Concurrent Number Cruncher: An Efficient Sparse Linear Solver on the GPU

Luc Buatois, Guillaume Caumon, Bruno Levy
Gocad Research Group, INRIA, Nancy Universite, France
High Performance Computing and Communications, Lecture Notes in Computer Science, 2007, Volume 4782/2007, 358-371


   title={Concurrent Number Cruncher: An Efficient Sparse Linear Solver on the GPU},

   author={Buatois, L. and Caumon, G. and L{‘e}vy, B.},

   journal={High Performance Computing and Communications},





Download Download (PDF)   View View   Source Source   



A wide class of geometry processing and PDE resolution methods needs to solve a linear system, where the non-zero pattern of the matrix is dictated by the connectivity matrix of the mesh. The advent of GPUs with their ever-growing amount of parallel horsepower makes them a tempting resource for such numerical computations. This can be helped by new APIs (CTM from ATI and CUDA from NVIDIA) which give a direct access to the multithreaded computational resources and associated memory bandwidth of GPUs; CUDA even provides a BLAS implementation but only for dense matrices (CuBLAS). However, existing GPU linear solvers are restricted to specific types of matrices, or use non-optimal compressed row storage strategies. By combining recent GPU programming techniques with supercomputing strategies (namely block compressed row storage and register blocking), we implement a sparse general-purpose linear solver which outperforms leading-edge CPU counterparts (MKL / ACML).
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: