High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning
Tokyo Institute of Technology, 152-8552, Meguro-ku, Tokyo, Japan
Computer Science – Research and Development, Volume 25, Numbers 1-2, 83-91 (2 April 2010)
@article{cevahir2010high,
title={High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning},
author={Cevahir, A. and Nukada, A. and Matsuoka, S.},
journal={Computer Science-Research and Development},
volume={25},
number={1},
pages={83–91},
issn={1865-2034},
year={2010},
publisher={Springer}
}
Motivated by high computation power and low price per performance ratio of GPUs, GPU accelerated clusters are being built for high performance scientific computing. In this work, we propose a scalable implementation of a Conjugate Gradient (CG) solver for unstructured matrices on a GPU-extended cluster, where each cluster node has multiple GPUs. Basic computations of the solver are held on GPUs and communications are managed by the CPU. For sparse matrix-vector multiplication, which is the most time-consuming operation, solver selects the fastest between several high performance kernels running on GPUs. In a GPU-extended cluster, it is more difficult than traditional CPU clusters to obtain scalability, since GPUs are very fast compared to CPUs. Since computation on GPUs is faster, GPU-extended clusters demand faster communication between compute units. To achieve scalability, we adopt hypergraph-partitioning models, which are state-of-the-art models for communication reduction and load balancing for parallel sparse iterative solvers. We implement a hierarchical partitioning model which better optimizes underlying heterogeneous system. In our experiments, we obtain up to 94 Gflops double-precision CG performance using 64 NVIDIA Tesla GPUs on 32 nodes.
November 16, 2010 by hgpu