https://hgpu.org/?p=7596
An MPI-CUDA Implementation and Optimization for Parallel Sparse Equations and Least Squares (LSQR)