Performance Evaluation of Mixed Precision Algorithms for Solving Sparse Linear Systems

hgpu.org » Programming » Algorithms » Performance Evaluation of Mixed Precision Algorithms for Solving Sparse Linear Systems

Performance Evaluation of Mixed Precision Algorithms for Solving Sparse Linear Systems

Mawussi Zounon, Nicholas J. Higham, Craig Lucas, Françoise Tisseur

Manchester Institute for Mathematical Sciences, School of Mathematics

The University of Manchester, 2020

BibTeX

Download (PDF)

View

Source

1785

views

It is well established that mixed precision algorithms that factorize a matrix at a precision lower than the working precision can reduce the execution time and the energy consumption of parallel solvers for dense linear systems. Much less is known about the efficiency of mixed precision parallel algorithms for sparse linear systems, and existing work focuses on single core experiments. We evaluate the benefits of using single precision arithmetic in solving a double precision sparse linear systems using multiple cores, focusing on the key components of LU factorization and matrix–vector products. We find that single precision sparse LU factorization is prone to a severe loss of performance due to the intrusion of subnormal numbers. We identify a mechanism that allows cascading fill-ins to generate subnormal numbers and show that automatically flushing subnormals to zero avoids the performance penalties. Our results show that the anticipated speedup of 2 over a double precision LU factorization is obtained only for the very largest of our test problems. For iterative solvers, we find that for the majority of the matrices computing or applying incomplete factorization preconditioners in single precision does not present sufficient performance benefits to justify the loss of accuracy compared with the use of double precision. We also find that using single precision for the matrix–vector product kernels provides an average speedup of 1.5 over double precision kernels, but new mixed precision algorithms are needed to exploit this benefit without losing the performance gain in the process of refining the solution to double precision accuracy.

Tags: Algorithms, Computer science, CUDA, Factorization, Mixed precision, nVidia, Tesla P100, Tesla V100

September 27, 2020 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org