6862

A Highly Efficient GPU-CPU Hybrid Parallel Implementation of Sparse LU Factorization

Liu Li, Liu Li, Yang Guangwen
Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
Chinese Journal of Electronics, Vol.21, No.1, 2012

@article{liu2012highly,

   title={A Highly Efficient GPU-CPU Hybrid Parallel Implementation of Sparse LU Factorization},

   author={LIU, L. and YANG, G.},

   year={2012}

}

Download Download (PDF)   View View   Source Source   

1095

views

In this paper, we try to accelerate sparse LU factorization on GPU. We present a tiled storage format and a parallel algorithm to improve the memory access pattern, and a register blocking method to compress the on-chip working set. The OPENMP implementation of our algorithm gives more stable performance over different matrices, and outperforms SuperLU and KLU by 1.88~6 times on an Intel 8-core CPU (Central processing unit) for matrices from the Florida matrix collection. Based on this algorithm, we further propose a GPU-CPU hybrid pipelined scheme to overlap computations on CPU with computations on GPU. Compared to the better of SuperLU and KLU on an Intel 8-core CPU, our algorithm achieves 1.1~19.7-fold speedup on GPU for double precision. Compared to the OPENMP implementation of our algorithm on an Intel 8-core CPU, our GPU implementation gets a 2-fold speedup for the best cases.
Rating: 0.5. From 1 vote.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: