high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Parallel Sparse Linear Algebra for Multi-core and Many-core Platforms: Parallel Solvers and Preconditioners

Parallel Sparse Linear Algebra for Multi-core and Many-core Platforms: Parallel Solvers and Preconditioners

Dimitar Lukarski

Institut fur Angewandte und Numerische Mathematik

Institut fur Angewandte und Numerische Mathematik, 2012

BibTeX

Download (PDF)

View

Source

2081

views

Partial differential equations are typically solved by means of finite difference, finite volume or finite element methods resulting in large, highly coupled, ill-conditioned and sparse (non-)linear systems. In order to minimize the computing time we want to exploit the capabilities of modern parallel architectures. The rapid hardware shifts from single core to multi-core and many-core processors lead to a gap in the progression of algorithms and programming environments for these platforms – the parallel models for large clusters do not fully utilize the performance capability of the multi-core CPUs and especially of the GPUs. Software stack needs to run adequately on the next generation of computing devices in order to exploit the potential of these new systems. Moving numerical software from one platform to another becomes an important task since every parallel device has its own programming model and language. The greatest challenge is to provide new techniques for solving (non-)linear systems that combine scalability, portability, fine-grained parallelism and flexibility across the assortment of parallel platforms and programming models. The goal of this thesis is to provide new fine-grained parallel algorithms embedded in advanced sparse linear algebra solvers and preconditioners on the emerging multi-core and many-core technologies. With respect to the mathematical methods, we focus on efficient iterative linear solvers. Here, we consider two types of solvers – out-of-the-box solvers such as preconditioned Krylov subspace solvers (e.g. CG, BiCGStab, GMRES), and problem-aware solvers such as geometric matrix-based multi-grid methods. Clearly, the majority of the solvers can be written in terms of sparse matrixvector and vector-vector operations which can be performed in parallel. Our aim is to provide parallel, generic and portable preconditioners which are suitable for multi-core and many-core devices. We focus on additive (e.g. Gauss-Seidel, SOR), multiplicative (ILU factorization with or without fill-ins) and approximate inverse preconditioners. The preconditioners can also be used as smoothing schemes in the multi-grid methods via a preconditioned defect correction step. We treat the additive splitting schemes by a multi-coloring technique to provide the necessary level of parallelism. For controlling the fill-in entries for the ILU factorization we propose a novel method which we call the power(q)-pattern method. We prove that this algorithm produces a new matrix structure with diagonal blocks containing only diagonal entries. This approach provides higher degrees of parallelism in comparison with the level-scheduling/topological sort algorithm. With these techniques we can perform the forward and backward substitution of the preconditioning step in parallel. By formulating the algorithm in block-matrix form we can execute the sweeps in parallel only by performing matrix-vector multiplications. Thus, we can express the data-parallelism in the sweeps without any specification of the underlying hardware or programming models. In object-oriented languages, an abstraction separates the object behavior from its implementation. Based on this abstraction, we have developed a linear algebra toolbox which supports several platforms such as multi-core CPUs, GPUs and accelerators. The various backends (sequential, OpenMP, CUDA, OpenCL) consist of optimized and platform-specific matrix and vector routines. Using unified interfaces across all platforms, the library allows users to build linear solvers and preconditioners without any information about the underlying hardware. With this technique, we can write our solvers and preconditioners in a single source code for all platforms. Furthermore, we can extend the library by adding new platforms without modifying the existing solvers and preconditioners. In our tests we consider two scenarios – preconditioned Krylov subspace methods and matrixbased multi-grid methods. We demonstrate speed ups in two directions: first, the preconditioners/smoothers reduce the total solution time by decreasing the number of iterations, and second, the preconditioning/smoothing phase is efficiently executed in parallel providing good scalability across several parallel architectures. We present numerical experiments and performance analysis on several platforms such as multi-core CPU and GPU devices. Furthermore, we show the viability and benefit of the proposed preconditioning schemes and software approach.

Tags: Algorithms, Computer science, CUDA, Differential equations, Factorization, FEM, Finite difference, Finite element method, Linear Algebra, nVidia, nVidia GeForce GTX 580, Partial differential equations, PDEs, Sparse matrix, Tesla M1060, Tesla S1070, Thesis

March 16, 2012 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Parallel Sparse Linear Algebra for Multi-core and Many-core Platforms: Parallel Solvers and Preconditioners

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Most viewed papers (last 30 days)

Parallel Sparse Linear Algebra for Multi-core and Many-core Platforms: Parallel Solvers and Preconditioners

Share this:

Recent source codes

Most viewed papers (last 30 days)