high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » On the design of sparse hybrid linear solvers for modern parallel architectures

On the design of sparse hybrid linear solvers for modern parallel architectures

Stojce Nakov

LaBRI – Laboratoire Bordelais de Recherche en Informatique

tel-01304315, (19 Apr 2016)

@phdthesis{nakov2015design,

title={On the design of sparse hybrid linear solvers for modern parallel architectures},

author={Nakov, Stojce},

year={2015},

school={Bordeaux}

}

Download (PDF)

View

Source

2569

views

In the context of this thesis, our focus is on numerical linear algebra, more precisely on solution of large sparse systems of linear equations. We focus on designing efficient parallel implementations of MaPHyS, an hybrid linear solver based on domain decomposition techniques. First we investigate the MPI+threads approach. In MaPHyS, the first level of parallelism arises from the independent treatment of the various subdomains. The second level is exploited thanks to the use of multi-threaded dense and sparse linear algebra kernels involved at the subdomain level. Such an hybrid implementation of an hybrid linear solver suitably matches the hierarchical structure of modern supercomputers and enables a trade-off between the numerical and parallel performances of the solver. We demonstrate the flexibility of our parallel implementation on a set of test examples. Secondly, we follow a more disruptive approach where the algorithms are described as sets of tasks with data inter-dependencies that leads to a directed acyclic graph (DAG) representation. The tasks are handled by a runtime system. We illustrate how a first task-based parallel implementation can be obtained by composing task-based parallel libraries within MPI processes throught a preliminary prototype implementation of our hybrid solver. We then show how a task-based approach fully abstracting the hardware architecture can successfully exploit a wide range of modern hardware architectures. We implemented a full task-based Conjugate Gradient algorithm and showed that the proposed approach leads to very high performance on multi-GPU, multicore and heterogeneous architectures.

Tags: Algorithms, Computer science, CUDA, Heterogeneous systems, Linear Algebra, MPI, nVidia, Sparse linear iterative solvers, Sparse matrix, Tesla K40, Tesla M2070, Thesis

April 29, 2016 by hgpu

Rating: 2.5/5. From 2 votes.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org

On the design of sparse hybrid linear solvers for modern parallel architectures

Your response

Recent source codes

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

Agentic Code Optimization via Compiler-LLM Cooperation

Most viewed papers (last 30 days)

On the design of sparse hybrid linear solvers for modern parallel architectures

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)