High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning

Ali Cevahir, Akira Nukada, Satoshi Matsuoka
Tokyo Institute of Technology, 152-8552, Meguro-ku, Tokyo, Japan
Computer Science – Research and Development, Volume 25, Numbers 1-2, 83-91 (2 April 2010)


   title={High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning},

   author={Cevahir, A. and Nukada, A. and Matsuoka, S.},

   journal={Computer Science-Research and Development},








Source Source   



Motivated by high computation power and low price per performance ratio of GPUs, GPU accelerated clusters are being built for high performance scientific computing. In this work, we propose a scalable implementation of a Conjugate Gradient (CG) solver for unstructured matrices on a GPU-extended cluster, where each cluster node has multiple GPUs. Basic computations of the solver are held on GPUs and communications are managed by the CPU. For sparse matrix-vector multiplication, which is the most time-consuming operation, solver selects the fastest between several high performance kernels running on GPUs. In a GPU-extended cluster, it is more difficult than traditional CPU clusters to obtain scalability, since GPUs are very fast compared to CPUs. Since computation on GPUs is faster, GPU-extended clusters demand faster communication between compute units. To achieve scalability, we adopt hypergraph-partitioning models, which are state-of-the-art models for communication reduction and load balancing for parallel sparse iterative solvers. We implement a hierarchical partitioning model which better optimizes underlying heterogeneous system. In our experiments, we obtain up to 94 Gflops double-precision CG performance using 64 NVIDIA Tesla GPUs on 32 nodes.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: