high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Multifrontal Factorization of Sparse SPD Matrices on GPUs

Multifrontal Factorization of Sparse SPD Matrices on GPUs

Thomas George, Vaibhav Saxena, Anshul Gupta, Amik Singh, Anamitra R. Choudhury

High Performance Computing Group, IBM Research India, New Delhi, India 110070

IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2011

DOI:10.1109/IPDPS.2011.44

BibTeX

Download (PDF)

View

Source

1761

views

Solving large sparse linear systems is often the most computationally intensive component of many scientific computing applications. In the past, sparse multifrontal direct factorization has been shown to scale to thousands of processors on dedicated supercomputers resulting in a substantial reduction in computational time. In recent years, an alternative computing paradigm based on GPUs has gained prominence, primarily due to its affordability, power-efficiency, and the potential to achieve significant speedup relative to desktop performance on regular and structured parallel applications. However, sparse matrix factorization on GPUs has not been explored sufficiently due to the complexity involved in an efficient implementation and concerns of low GPU utilization. In this paper, we present an adaptive hybrid approach for accelerating sparse multifrontal factorization based on a judicious exploitation of the processing power of the host CPU and GPU. We present four different policies for distributing and scheduling the workload between the host CPU and the GPU, and propose a mechanism for a runtime selection of the appropriate policy for each step of sparse Cholesky factorization. This mechanism relies on auto-tuning based on modeling the best policy predictor as a parametric classifier. We estimate the classifier parameters from the available empirical computation time data such that the expected computation time is minimized. This approach is readily adaptable for using the current or an extended set of policies for different CPU-GPU combinations as well as for different combinations of dense kernels for both the CPU and the GPU.

Tags: Computer science, CUDA, Factorization, nVidia, Sparse matrix, Tesla S1070

December 26, 2011 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Multifrontal Factorization of Sparse SPD Matrices on GPUs

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Multifrontal Factorization of Sparse SPD Matrices on GPUs

Share this:

Recent source codes

Most viewed papers (last 30 days)