high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Fast Conjugate Gradients with Multiple GPUs

Fast Conjugate Gradients with Multiple GPUs

Ali Cevahir, Akira Nukada, Satoshi Matsuoka

Tokyo Institute of Technology

Computational Science – ICCS 2009 (2009), pp. 893-903

DOI:10.1007/978-3-642-01970-8_90

BibTeX

Download (PDF)

View

Source

1966

views

The limiting factor for efficiency of sparse linear solvers is the memory bandwidth. In this work, we describe a fast Conjugate Gradient solver for unstructured problems, which runs on multiple GPUs installed on a single mainboard. The solver achieves double precision accuracy with single precision GPUs, using a mixed precision iterative refinement algorithm. To achieve high computation speed, we propose a fast sparse matrix-vector multiplication algorithm, which is the core operation of iterative solvers. The proposed multiplication algorithm efficiently utilizes GPU resources via caching, coalesced memory accesses and load balance between running threads. Experiments on wide range of matrices show that our matrix-vector multiplication algorithm achieves up to 11.6 Gflops on single GeForce 8800 GTS card and CG implementation achieves up to 24.6 Gflops with four GPUs.

Tags: Computer science, Conjugate gradient solver, CUDA, Linear Algebra, Matrix multiplication, nVidia, nVidia GeForce 8800 GTS

November 27, 2010 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

Fast Conjugate Gradients with Multiple GPUs

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Fast Conjugate Gradients with Multiple GPUs

Share this:

Recent source codes

Most viewed papers (last 30 days)