high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Improving the Performance of CA-GMRES on Multicores with Multiple GPUs

Improving the Performance of CA-GMRES on Multicores with Multiple GPUs

Ichitaro Yamazaki, Hartwig Anzt, Stanimire Tomov, Mark Hoemmenx, Jack Dongarra

University of Tennessee, Knoxville, USA

IPDPS, 2014

@article{yamazaki2014improving,

title={Improving the Performance of CA-GMRES on Multicores with Multiple GPUs},

author={Yamazaki, Ichitaro and Anzt, Hartwig and Tomov, Stanimire and Hoemmen, Mark and Dongarra, Jack},

year={2014}

}

Download (PDF)

View

Source

1333

views

The Generalized Minimum Residual (GMRES) method is one of the most widely-used iterative methods for solving nonsymmetric linear systems of equations. In recent years, techniques to avoid communication in GMRES have gained attention because in comparison to floating-point operations, communication is becoming increasingly expensive on modern computers. Since graphics processing units (GPUs) are now becoming crucial component in computing, we investigate the effectiveness of these techniques on multicore CPUs with multiple GPUs. While we present the detailed performance studies of a matrix powers kernel on multiple GPUs, we particularly focus on orthogonalization strategies that have a great impact on both the numerical stability and performance of GMRES, especially as the matrix becomes sparser or ill-conditioned. We present the experimental results on two eight-core Intel Sandy Bridge CPUs with three NDIVIA Fermi GPUs and demonstrate that significant speedups can be obtained by avoiding communication, both on a single GPU and between multiple GPUs. As part of our study, we study several optimization techniques for the GPU kernels that can also be used in other iterative solvers besides GMRES. Hence, our studies not only emphasize the importance of avoiding communication on GPUs, but they also provide insight about the effects of these optimization techniques on the performance of the sparse solvers, and may have greater impact beyond GMRES.

Tags: Computer science, CUDA, Linear Algebra, nVidia, Performance, Tesla M2090

January 18, 2014 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Improving the Performance of CA-GMRES on Multicores with Multiple GPUs

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Improving the Performance of CA-GMRES on Multicores with Multiple GPUs

Share this:

Recent source codes

Most viewed papers (last 30 days)