high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Mixed-precision Orthogonalization Scheme and Adaptive Step Size for CA-GMRES on GPUs

Mixed-precision Orthogonalization Scheme and Adaptive Step Size for CA-GMRES on GPUs

Ichitaro Yamazaki, Stanimire Tomov, Tingxing Dong, Jack Dongarra

University of Tennessee, Knoxville, U.S.A.

University of Tennessee, Technical report ut-eecs-14-730, 2014

@article{yamazaki2014mixed,

title={Mixed-Precision Orthogonalization Scheme and Adaptive Step Size for CA-GMRES on GPUs},

author={Yamazaki, Ichitaro and Tomov, Stanimire and Dong, Tingxing and Dongarra, Jack},

year={2014}

}

Download (PDF)

View

Source

1611

views

We propose a mixed-precision orthogonalization scheme that takes the input matrix in a standard 32 or 64-bit floating-point precision, but accumulates its intermediate results in the doubled-precision. For a 64-bit input matrix, we use software emulation for the higher-precision arithmetics. Compared with the standard orthogonalization scheme, we require about 8:5 more computation but a much smaller increase in communication. Since the computation is becoming less expensive compared to the communication on new and emerging architectures, the relative cost of our mixed-precision scheme is decreasing. Our case studies with CA-GMRES on a GPU demonstrate that using mixed-precision for this small but critical segment of CA-GMRES can improve not only its overall numerical stability but also, in some cases, its performance. We also study an adaptive scheme to dynamically adjust the step size of the matrix powers kernel. Our experiments on multiple GPUs show that a near optimal step size can be chosen based on the performance measurements from the first restart loop of CA-GMRES.

Tags: Computer science, Linear Algebra, nVidia, Performance, Tesla K20, Tesla K40, Tesla M2090

July 1, 2014 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Mixed-precision Orthogonalization Scheme and Adaptive Step Size for CA-GMRES on GPUs

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

Mixed-precision Orthogonalization Scheme and Adaptive Step Size for CA-GMRES on GPUs

Share this:

Recent source codes

Most viewed papers (last 30 days)