high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Analysis of A Splitting Approach for the Parallel Solution of Linear Systems on GPU Cards

Analysis of A Splitting Approach for the Parallel Solution of Linear Systems on GPU Cards

Ang Li, Radu Serban, Dan Negrut

Electrical and Computer Engineering, University of Wisconsin-Madison, Madison, WI 53706

arXiv:1509.07919 [cs.DC], (25 Sep 2015)

@article{li2015analysis,

title={Analysis of A Splitting Approach for the Parallel Solution of Linear Systems on GPU Cards},

author={Li, Ang and Serban, Radu and Negrut, Dan},

year={2015},

month={sep},

archivePrefix={"arXiv"},

primaryClass={cs.DC}

}

Download (PDF)

View

Source

Source codes

Package:

SaP GPU

2369

views

We discuss an approach for solving sparse or dense banded linear systems ${bf A} {bf x} = {bf b}$ on a Graphics Processing Unit (GPU) card. The matrix ${bf A} in {mathbb{R}}^{N times N}$ is possibly nonsymmetric and moderately large; i.e., $10000 leq N leq 500000$. The ${it split and parallelize}$ (${tt SaP}$) approach seeks to partition the matrix ${bf A}$ into diagonal sub-blocks ${bf A}_i$, $i=1,ldots,P$, which are independently factored in parallel. The solution may choose to consider or to ignore the matrices that couple the diagonal sub-blocks ${bf A}_i$. This approach, along with the Krylov subspace-based iterative method that it preconditions, are implemented in a solver called ${tt SaP::GPU}$, which is compared in terms of efficiency with three commonly used sparse direct solvers: ${tt PARDISO}$, ${tt SuperLU}$, and ${tt MUMPS}$. ${tt SaP::GPU}$, which runs entirely on the GPU except several stages involved in preliminary row-column permutations, is robust and compares well in terms of efficiency with the aforementioned direct solvers. In a comparison against Intel’s ${tt MKL}$, ${tt SaP::GPU}$ also fares well when used to solve dense banded systems that are close to being diagonally dominant. ${tt SaP::GPU}$ is publicly available and distributed as open source under a permissive BSD3 license.

Tags: Computer science, CUDA, Linear Algebra, nVidia, Package, Sparse direct solvers, Tesla K20

September 30, 2015 by hgpu

Rating: 2.5/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Analysis of A Splitting Approach for the Parallel Solution of Linear Systems on GPU Cards

Package:

Your response

Recent source codes

ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation

SeedFold: Scaling Biomolecular Structure Prediction

Tilus: A Tile-Level GPU Kernel Programming Language

Memory-Efficient Acceleration of Block Low-Rank Foundation Models on Resource Constrained GPUs

BoltzGen:Toward Universal Binder Design

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution

MATLAB Tensor Core models

TritonForge: Transform PyTorch Operations into Optimized GPU Kernels with LLMs

RLTune: Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters

Most viewed papers (last 30 days)

Analysis of A Splitting Approach for the Parallel Solution of Linear Systems on GPU Cards

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)