Parallel GMRES implementation for solving sparse linear systems on GPU clusters
University of Franche-Comte, LIFC laboratory, Rue Engel-Gros, BP, Belfort Cedex, France
Proceedings of the 19th High Performance Computing Symposia (HPC ’11), 2011
@article{bahi2011parallel,
title={Parallel GMRES implementation for solving sparse linear systems on GPU clusters},
author={Bahi, J.M. and Couturier, R. and Khodja, L.Z.},
year={2011}
}
In this paper, we propose an efficient parallel implementation of the GMRES method for GPU clusters. This implementation requires us to parallelize the GMRES algorithm between the CPUs of the cluster. Hence, all parallel and intensive computations on local data are performed on GPUs and reduction operations to compute global results are carried out by CPUs. The performances of our parallel GMRES solver are evaluated on test matrices of sizes exceeding 10^7 rows. They show that solving large and sparse linear systems on a GPU cluster is faster than those performed on its CPU counterpart. It is noticed that a cluster of 12 GPUs is about 8 times faster than a cluster of 12 CPUs and about 5 times faster than a cluster of 24 CPUs.
November 25, 2011 by hgpu