high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning

High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning

Ali Cevahir, Akira Nukada, Satoshi Matsuoka

Tokyo Institute of Technology, 152-8552, Meguro-ku, Tokyo, Japan

Computer Science – Research and Development, Volume 25, Numbers 1-2, 83-91 (2 April 2010)

DOI:10.1007/s00450-010-0112-6

@article{cevahir2010high,

title={High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning},

author={Cevahir, A. and Nukada, A. and Matsuoka, S.},

journal={Computer Science-Research and Development},

volume={25},

number={1},

pages={83–91},

issn={1865-2034},

year={2010},

publisher={Springer}

}

Source

2209

views

Motivated by high computation power and low price per performance ratio of GPUs, GPU accelerated clusters are being built for high performance scientific computing. In this work, we propose a scalable implementation of a Conjugate Gradient (CG) solver for unstructured matrices on a GPU-extended cluster, where each cluster node has multiple GPUs. Basic computations of the solver are held on GPUs and communications are managed by the CPU. For sparse matrix-vector multiplication, which is the most time-consuming operation, solver selects the fastest between several high performance kernels running on GPUs. In a GPU-extended cluster, it is more difficult than traditional CPU clusters to obtain scalability, since GPUs are very fast compared to CPUs. Since computation on GPUs is faster, GPU-extended clusters demand faster communication between compute units. To achieve scalability, we adopt hypergraph-partitioning models, which are state-of-the-art models for communication reduction and load balancing for parallel sparse iterative solvers. We implement a hierarchical partitioning model which better optimizes underlying heterogeneous system. In our experiments, we obtain up to 94 Gflops double-precision CG performance using 64 NVIDIA Tesla GPUs on 32 nodes.

Tags: Computer science, Conjugate gradient solver, CUDA, GPU cluster, nVidia

November 16, 2010 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning

Share this:

Recent source codes

Most viewed papers (last 30 days)