high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Fast Conjugate Gradients with Multiple GPUs

Fast Conjugate Gradients with Multiple GPUs

Ali Cevahir, Akira Nukada, Satoshi Matsuoka

Tokyo Institute of Technology

Computational Science – ICCS 2009 (2009), pp. 893-903

DOI:10.1007/978-3-642-01970-8_90

@article{cevahir2009fast,

title={Fast conjugate gradients with multiple GPUs},

author={Cevahir, A. and Nukada, A. and Matsuoka, S.},

journal={Computational Science–ICCS 2009},

pages={893–903},

year={2009},

publisher={Springer}

}

Download (PDF)

View

Source

1611

views

The limiting factor for efficiency of sparse linear solvers is the memory bandwidth. In this work, we describe a fast Conjugate Gradient solver for unstructured problems, which runs on multiple GPUs installed on a single mainboard. The solver achieves double precision accuracy with single precision GPUs, using a mixed precision iterative refinement algorithm. To achieve high computation speed, we propose a fast sparse matrix-vector multiplication algorithm, which is the core operation of iterative solvers. The proposed multiplication algorithm efficiently utilizes GPU resources via caching, coalesced memory accesses and load balance between running threads. Experiments on wide range of matrices show that our matrix-vector multiplication algorithm achieves up to 11.6 Gflops on single GeForce 8800 GTS card and CG implementation achieves up to 24.6 Gflops with four GPUs.

Tags: Computer science, Conjugate gradient solver, CUDA, Linear Algebra, Matrix multiplication, nVidia, nVidia GeForce 8800 GTS

November 27, 2010 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Fast Conjugate Gradients with Multiple GPUs

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Fast Conjugate Gradients with Multiple GPUs

Share this:

Recent source codes

Most viewed papers (last 30 days)