Stability and Performance of Various Singular Value QR Implementations on Multicore CPU with a GPU

Ichitaro Yamazaki, Stanimire Tomov, Jack Dongarra
Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, U.S.A.
TOMS, 2016


   title={Stability and Performance of Various Singular Value QR Implementations on Multicore CPU with a GPU},

   author={Yamazaki, Ichitaro and Tomov, Stanimire and Dongarra, Jack},



Download Download (PDF)   View View   Source Source   



Singular Value QR (SVQR) can orthonormalize a set of dense vectors with the minimum communication (one global reduction between the parallel processing units, and BLAS-3 to perform most of its local computation). As a result, compared to other orthogonalization schemes, SVQR obtains superior performance on many of the current computers, where the communication has become significantly more expensive compared to the arithmetic operations. In this paper, we study the stability and performance of various SVQR implementations on multicore CPUs with a GPU. Our focus is on the dense triangular solve, which performs half of the total floating-point operations of SVQR. As a part of this study, we examine an adaptive mixed-precision variant of SVQR, which decides if a lower-precision arithmetic can be used for the triangular solution at run time without increasing the order of its orthogonality error (though its backward error is significantly greater). If the greater backward-error can be tolerated, then our performance results with an NVIDIA Kepler GPU show that the mixed-precision SVQR can obtain a speedup of up to 1.36 over the standard SVQR.
Rating: 2.1. From 5 votes.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: