high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Basker: A Threaded Sparse LU Factorization Utilizing Hierarchical Parallelism and Data Layouts

Basker: A Threaded Sparse LU Factorization Utilizing Hierarchical Parallelism and Data Layouts

Joshua Dennis Booth, Sivasankaran Rajamanickam, Heidi K. Thornquist

Sandia National Laboratories, Albuquerque, New Mexico

arXiv:1601.05725 [cs.DC], (21 Jan 2016)

@article{booth2016basker,

title={Basker: A Threaded Sparse LU Factorization Utilizing Hierarchical Parallelism and Data Layouts},

author={Booth, Joshua Dennis and Rajamanickam, Sivasankaran and Thornquist, Heidi K.},

year={2016},

month={jan},

archivePrefix={"arXiv"},

primaryClass={cs.DC}

}

Download (PDF)

View

Source

1577

views

Scalable sparse LU factorization is critical for efficient numerical simulation of circuits and electrical power grids. In this work, we present a new scalable sparse direct solver called Basker. Basker introduces a new algorithm to parallelize the Gilbert-Peierls algorithm for sparse LU factorization. As architectures evolve, there exists a need for algorithms that are hierarchical in nature to match the hierarchy in thread teams, individual threads, and vector level parallelism. Basker is designed to map well to this hierarchy in architectures. There is also a need for data layouts to match multiple levels of hierarchy in memory. Basker uses a two-dimensional hierarchical structure of sparse matrices that maps to the hierarchy in the memory architectures and to the hierarchy in parallelism. We present performance evaluations of Basker on the Intel SandyBridge and Xeon Phi platforms using circuit and power grid matrices taken from the University of Florida sparse matrix collection and from Xyce circuit simulations. Basker achieves a geometric mean speedup of 5.91x on CPU (16 cores) and 7.4x on Xeon Phi (32 cores) relative to KLU. Basker outperforms Intel MKL Pardiso (PMKL) by as much as 53x on CPU (16 cores) and 13.3x on Xeon Phi (32 cores) for low fill-in circuit matrices. Furthermore, Basker provides 5.4x speedup on a challenging matrix sequence taken from an actual Xyce simulation.

Tags: Algorithms, Computer science, Factorization, Intel Xeon Phi, Sparse direct solvers, Sparse matrix

January 22, 2016 by hgpu

Rating: 1.5/5. From 2 votes.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Basker: A Threaded Sparse LU Factorization Utilizing Hierarchical Parallelism and Data Layouts

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

Basker: A Threaded Sparse LU Factorization Utilizing Hierarchical Parallelism and Data Layouts

Share this:

Recent source codes

Most viewed papers (last 30 days)