A Framework for Dense Triangular Matrix Kernels on Various Manycore Architectures

Ali Charara, David Keyes, Hatem Ltaief
Extreme Computing Research Center, King Abdullah University of Science and Technology, Thuwal, Jeddah 23955, Saudi Arabia
King Abdullah University of Science and Technology, 2016


   title={A Framework for Dense Triangular Matrix Kernels on Various Manycore Architectures},

   author={Charara, Ali and Keyes, David E and Ltaief, Hatem},




Download Download (PDF)   View View   Source Source   Source codes Source codes




We present a new high performance framework for dense triangular BLAS kernels, i.e., triangular matrix-matrix multiplication (TRMM) and triangular solve (TRSM), on various manycore architectures. This is an extension of a previous work on a single GPU by the same authors (Charara et al., EuroPar, 2016). In this paper, the performance of triangular BLAS kernels on a single GPU is further enhanced by implementing customized CUDA kernels for TRMM and TRSM, which are called at the bottom of the recursion tree. In addition, a multiple GPU implementation of TRMM and TRSM is proposed and shows an almost linear performance scaling, as the number of GPUs increases. Finally, the algorithmic recursive formulation of these triangular BLAS kernels is in fact oblivious to the targeted hardware architecture. We, therefore, port these recursive kernels to homogeneous x86 hardware architectures by relying on the vendor optimized BLAS implementations. Results reported on various hardware architectures highlight a significant performance improvement against state-of-the-art implementations. These new kernels are freely available in the KAUST BLAS (KBLAS) open-source library.
Rating: 2.0/5. From 4 votes.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: