GPU acceleration of preconditioned solvers for ill-conditioned linear systems
Technische Universitiet Delft
Technische Universitiet Delft, 2015
@phdthesis{gupta2015gpu,
title={GPU acceleration of preconditioned solvers for ill-conditioned linear systems},
author={Gupta, ROHIT},
year={2015},
school={TU Delft, Delft University of Technology}
}
In this work we study the implementations of deflation and preconditioning techniques for solving ill-conditioned linear systems using iterative methods. Solving such systems can be a time-consuming process because of the jumps in the coefficients due to large difference in material properties. We have developed implementations of the iterative methods with these preconditioning techniques on the GPU and multi-core CPUs in order to significantly reduce the computing time. The problems we have chosen have a symmetric and positive definite coefficient matrix. We have further extended these implementations for scalability on clusters of GPUs and multi-core CPUs. We outline the challenges involved in making a robust preconditioned solver that is suitable for scaling in a parallel environment. To start with, we experimented with simple techniques to establish the feasibility of implementing specialized preconditioning schemes (deflation) for specific problems (bubbly flow). We provide an analysis for the choices we make for implementing certain parts (e.g. solution of inner system in deflation) of these operations and tune the data structures keeping in mind the hardware capabilities. We improve our solvers by refining the choices we make for the application of these techniques (Neumann Preconditioning and better deflation vectors). For different options available we compare the effect when varying problem parameters (e.g. number of bubbles and deflation vectors). After testing our methods on standalone machines with multi-core CPUs and a single GPU we make a parallel implementation using MPI. We explore different data divisions in order to establish the effect of communication and choose the more scalable approach. In order to make our results more comprehensive we also test the implementations on two different clusters. We show the results for two versions of our code: one for multi-core CPUs and another one for multiple GPUs per node. Our methods show strong scaling behavior. To further evaluate the use of deflation combined with a simple preconditioning technique we test our implementation of the iterative method for solving linear systems from porous media flow problems. We use a library with many different kinds of preconditioners on parallel platforms. We test implementations with and without deflation. Our results show good performance of the iterative methods we use in combination with deflation and preconditioning for a number of problems. Through our experiments we bring about the effectiveness of deflation for implementation on parallel platforms and extend its applicability to problems from different domains.
October 11, 2015 by hgpu