high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Toward Accelerating the Matrix Inversion Computation of Symmetric Positive-Definite Matrices on Heterogeneous GPU-Based Systems

Toward Accelerating the Matrix Inversion Computation of Symmetric Positive-Definite Matrices on Heterogeneous GPU-Based Systems

Huda Ibeid, Dinesh Kaushik, David Keyes, Hatem Ltaief

Division of Mathematical and Computer Sciences and Engineering, King Abdullah University of Science and Technology, Thuwal, KSA

HiPC 2011 Student Research Symposium, 2011

@article{ibeid2011toward,

title={Toward Accelerating the Matrix Inversion Computation of Symmetric Positive-Definite Matrices on Heterogeneous GPU-Based Systems},

author={Ibeid, H. and Kaushik, D. and Keyes, D. and Ltaief, H.},

booktitle={HiPC 2011 Student Research Symposium},

year={2011}

}

Download (PDF)

View

Source

1950

views

The goal of this paper is to implement an efficient matrix inversion of symmetric positive-definite matrices on heterogeneous GPU-based systems. The matrix inversion procedure can be split into three stages: computing the Cholesky factorization, inverting the Cholesky factor and calculating the product of the inverted Cholesky factor with its transpose to get the final inverted matrix. Using high performance data layout, which represents the matrix in the system memory with an optimized cache-aware format, the computation of the three stages is decomposed into fine-grained computational tasks. The data flow programming model can then be represented as a directed acyclic graph, where nodes represent tasks and edges the dependencies between them. Standard implementations of matrix inversions as well as other numerical algorithms (e.g., linear and eigenvalue solvers), available in the state-of-theart numerical libraries (e.g., LAPACK), rely on the expensive fork-join paradigm to achieve parallel performance and are characterized by artifactual synchronization points, which have to be removed to fully exploit the underlying hardware capabilities. Our tile algorithmic approach allows to remove those bottlenecks and to flawlessly execute the tasks, as soon as the data dependencies are satisfied. A hybrid runtime environment system becomes paramount to dynamically schedule the numerical kernels on the available processing units, whether it is a hardware accelerator (i.e, GPU) or a homogeneous multicore (i.e., x86), and this is transparently carried out from the user. Preliminary results are shown on a dual-socket quadcore Intel Xeon 2.67GHz workstation with two nVIDIA Fermi C2070 GPU cards. Our implementation (448 Gflop/s) results in up to 5 and 6-fold improvement compared to the equivalent routines from MAGMA V1.0 and PLASMA V2.4, respectively, and 10-fold improvement compared to LAPACK V3.2 linked with multithreaded Intel MKL BLAS V10.2, with a matrix size of 24960×24960.

Tags: Algorithms, Computer science, CUDA, Factorization, Heterogeneous systems, Matrix inversion, nVidia, Tesla C2070

December 14, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Toward Accelerating the Matrix Inversion Computation of Symmetric Positive-Definite Matrices on Heterogeneous GPU-Based Systems

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

Toward Accelerating the Matrix Inversion Computation of Symmetric Positive-Definite Matrices on Heterogeneous GPU-Based Systems

Share this:

Recent source codes

Most viewed papers (last 30 days)