high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Performance and energy optimization of the iterative solution of sparse linear systems on multicore processors

Performance and energy optimization of the iterative solution of sparse linear systems on multicore processors

Maria Barreda Vaya

Universitat Jaume I De Castello

Universitat Jaume I De Castello, 2017

@article{orti2017performance,

title={Performance and energy optimization of the iterative solution of sparse linear systems on multicore processors},

author={Ort{i}, Enrique S Quintana},

year={2017}

}

Download (PDF)

View

Source

1696

views

Large sparse systems of linear equations are ubiquitous problems in diverse scientific and engineering applications and big-data analytics. The interest of these applications and the fact that the solution of the linear system is usually a significant time-consuming stage has promoted the design and high-performance implementation of numerous matrix storage formats, algorithms, and libraries to efficiently tackle sparse instances of these linear algebra problems in general-purpose processorss (GPPs), following the evolution of computer architectures. High Performance Computing (HPC) architectures enable the solution of complex applications by aggregating a number of multicore processors. As a consequence, developers face the challenge of implementing parallel algorithms that efficiently exploit the concurrency of the hardware. Furthermore, the advances in the number of transistors that can be integrated in a circuit have not enjoyed a proportional reduction of the power dissipated by the CMOS technology, turning the power wall into a crucial challenge that the HPC community needs to address. Unfortunately, despite the importance of energy consumption, few software developers take it into account in their implementations. In this dissertation we target the solution of large sparse systems of linear equations using preconditioned iterative methods based on Krylov subspaces. Specifically, we focus our efforts on ILUPACK, a library that offers multi-level Incomplete LU (ILU) preconditioners for the effective solution of sparse linear systems. The increase of the number of equations in these systems and the introduction of new HPC architectures motivates us to develop a parallel version of ILUPACK which optimizes both execution time and energy consumption on current multicore architectures and clusters of nodes built from this type of technology. Thus, the main goal of this thesis is the design, implementation and evaluation of parallel and energy-efficient iterative sparse linear system solvers for multicore processors as well as recent manycore accelerators such as the Intel Xeon Phi. To fulfill the general objective of the thesis, we optimize ILUPACK exploiting task parallelism via the programming models underlying OmpSs, MPI and a combination of both. These implementations are also tuned for their execution on specialized architectures like Non-Uniform Memory Access (NUMA) platforms or the Intel Xeon Phi. Finally, the energy efficiency of the solver is evaluated in different multicore platforms, taking advantage of an automatic framework to detect power sinks, also developed as part of this thesis.

Tags: Algorithms, Computer science, CUDA, Energy-efficient computing, Intel Xeon Phi, Linear Algebra, MPI, nVidia, OmpSs, Tesla C2050, Thesis

April 11, 2017 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Performance and energy optimization of the iterative solution of sparse linear systems on multicore processors

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Performance and energy optimization of the iterative solution of sparse linear systems on multicore processors

Share this:

Recent source codes

Most viewed papers (last 30 days)