high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Performance and energy optimization of the iterative solution of sparse linear systems on multicore processors

Performance and energy optimization of the iterative solution of sparse linear systems on multicore processors

Maria Barreda Vaya

Universitat Jaume I De Castello

Universitat Jaume I De Castello, 2017

@article{orti2017performance,

title={Performance and energy optimization of the iterative solution of sparse linear systems on multicore processors},

author={Ort{i}, Enrique S Quintana},

year={2017}

}

Download (PDF)

View

Source

2600

views

Large sparse systems of linear equations are ubiquitous problems in diverse scientific and engineering applications and big-data analytics. The interest of these applications and the fact that the solution of the linear system is usually a significant time-consuming stage has promoted the design and high-performance implementation of numerous matrix storage formats, algorithms, and libraries to efficiently tackle sparse instances of these linear algebra problems in general-purpose processorss (GPPs), following the evolution of computer architectures. High Performance Computing (HPC) architectures enable the solution of complex applications by aggregating a number of multicore processors. As a consequence, developers face the challenge of implementing parallel algorithms that efficiently exploit the concurrency of the hardware. Furthermore, the advances in the number of transistors that can be integrated in a circuit have not enjoyed a proportional reduction of the power dissipated by the CMOS technology, turning the power wall into a crucial challenge that the HPC community needs to address. Unfortunately, despite the importance of energy consumption, few software developers take it into account in their implementations. In this dissertation we target the solution of large sparse systems of linear equations using preconditioned iterative methods based on Krylov subspaces. Specifically, we focus our efforts on ILUPACK, a library that offers multi-level Incomplete LU (ILU) preconditioners for the effective solution of sparse linear systems. The increase of the number of equations in these systems and the introduction of new HPC architectures motivates us to develop a parallel version of ILUPACK which optimizes both execution time and energy consumption on current multicore architectures and clusters of nodes built from this type of technology. Thus, the main goal of this thesis is the design, implementation and evaluation of parallel and energy-efficient iterative sparse linear system solvers for multicore processors as well as recent manycore accelerators such as the Intel Xeon Phi. To fulfill the general objective of the thesis, we optimize ILUPACK exploiting task parallelism via the programming models underlying OmpSs, MPI and a combination of both. These implementations are also tuned for their execution on specialized architectures like Non-Uniform Memory Access (NUMA) platforms or the Intel Xeon Phi. Finally, the energy efficiency of the solver is evaluated in different multicore platforms, taking advantage of an automatic framework to detect power sinks, also developed as part of this thesis.

Tags: Algorithms, Computer science, CUDA, Energy-efficient computing, Intel Xeon Phi, Linear Algebra, MPI, nVidia, OmpSs, Tesla C2050, Thesis

April 11, 2017 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Performance and energy optimization of the iterative solution of sparse linear systems on multicore processors

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Performance and energy optimization of the iterative solution of sparse linear systems on multicore processors

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)