high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning

High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning

Ali Cevahir, Akira Nukada, Satoshi Matsuoka

Tokyo Institute of Technology, 152-8552, Meguro-ku, Tokyo, Japan

Computer Science – Research and Development, Volume 25, Numbers 1-2, 83-91 (2 April 2010)

DOI:10.1007/s00450-010-0112-6

BibTeX

Source

2609

views

Motivated by high computation power and low price per performance ratio of GPUs, GPU accelerated clusters are being built for high performance scientific computing. In this work, we propose a scalable implementation of a Conjugate Gradient (CG) solver for unstructured matrices on a GPU-extended cluster, where each cluster node has multiple GPUs. Basic computations of the solver are held on GPUs and communications are managed by the CPU. For sparse matrix-vector multiplication, which is the most time-consuming operation, solver selects the fastest between several high performance kernels running on GPUs. In a GPU-extended cluster, it is more difficult than traditional CPU clusters to obtain scalability, since GPUs are very fast compared to CPUs. Since computation on GPUs is faster, GPU-extended clusters demand faster communication between compute units. To achieve scalability, we adopt hypergraph-partitioning models, which are state-of-the-art models for communication reduction and load balancing for parallel sparse iterative solvers. We implement a hierarchical partitioning model which better optimizes underlying heterogeneous system. In our experiments, we obtain up to 94 Gflops double-precision CG performance using 64 NVIDIA Tesla GPUs on 32 nodes.

Tags: Computer science, Conjugate gradient solver, CUDA, GPU cluster, nVidia

November 16, 2010 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)