## High Performance Computing for solving large sparse systems. Optical Diffraction Tomography as a case of study

University of Almeria, Department of Informatics

University of Almeria, 2015

@article{lopez2015high,

title={High Performance Computing for solving large sparse systems. Optical Diffraction Tomography as a case of study},

author={L{‘o}pez, Gloria Ortega},

year={2015}

}

This thesis, entitled "High Performance Computing for solving large sparse systems. Optical Diffraction Tomography as a case of study" investigates the computational issues related to the resolution of linear systems of equations which come from the discretization of physical models described by means of Partial Differential Equations (PDEs). These physical models are conceived for the description of the space-temporary behavior of some physical phenomena f(x, y, z, t) in terms of their variations (partial derivative) with respect to the dependent variables of the phenomena. There is a wide variety of discretization methods for PDEs. Two of the most well-known methods are the Finite Difference Method (FDM) and the Finite Element Method (FEM). Both methods result in an algebraic description of the model that can be translated into the approach of a linear system of equations of type (Ax = b), where A is a sparse matrix (a high percentage of zero elements) whose size depends on the required accuracy of the modeled phenomena. This thesis begins with the algebraic description of the model associated with the physical phenomena, and the work herein has been focused on the design of techniques and computational models that allow the resolution of these linear systems of equations. The main interest of this study is specially focused on models which require a high level of discretization and usually generate sparse matrices, A, which have a highly sparse structure and large size. Literature characterizes these types of problems by their high demanding computational requirements (because of their fine degree of discretization) and the sparsity of the matrices involved, suggesting that these kinds of problems can only be solved using High Performance Computing techniques and architectures. Nowadays, the High Performance Computing architectures that are most widely used are the heterogeneous platforms based on distributed memory systems, where every node has a multi-core architecture with a different number of cores. Moreover, these architectures can also provide several accelerators, such as vector units, FPGAs, GPUs, Intel Xeon Phi coprocessors and a heterogeneous interconnection network composed of links with different bandwidth and latency. One of the main goals of this thesis is the research of the possible alternatives which allow the implementation of routines to solve large and sparse linear systems of equations using High Performance Computing (HPC). The use of massively parallel platforms (GPUs) allows the acceleration of these routines, because they have several advantages for vectorial computation schemes. On the other hand, the use of distributed memory platforms allows the resolution of problems defined by matrices of enormous size. Finally, the combination of both techniques, distributed computation and multi-GPUs, will allow faster resolution of interesting problems in which large and sparse matrices are involved. In this line, one of the goals of this thesis is to supply the scientific community with implementations based on multi-GPU clusters to solve sparse linear systems of equations, which are the key in many scientific computations. The second part of this thesis is focused on a real physical problem of Optical Diffractional Tomography (ODT) based on holographic information. ODT is a non-damaging technique which allows the extraction of the shapes of objects with high accuracy. Therefore, this technique is very suitable to the in vivo study of real specimens, microorganisms, etc., and it also makes the investigation of their dynamics possible. A preliminary physical model based on a bidimensional reconstruction of the seeding particle distribution in fluids was proposed by J. Lobera and J.M. Coupland. However, its high computational cost (in both memory requirements and runtime) made compulsory the use of HPC techniques to extend the implementation to a three dimensional model. In the second part of this thesis, the implementation and validation of this physical model for the case of three dimensional reconstructions is carried out. In such implementation, the resolution of large and sparse linear systems of equations is required. Thus, some of the algebraic routines developed in the first part of the thesis have been used to implement computational strategies capable of solving the problem of 3D reconstruction based on ODT. This thesis is organized into six chapters. The first is an introduction to the main research areas involved in this thesis and it also presents the materials and methods that were used. Firstly, several parallel architectures and parallel programming models are briefly described. This is followed by the discussion of the issues related to the methods for solving linear systems of equations, one of the keys of Linear Algebra. Finally, the platforms used to evaluate the experiments are described. Chapter 2 is dedicated to the computational aspects of sparse matrices on GPU platforms. First of all, a description of the most widely used compressed storage formats is made. Next, details of an implementation of the Sparse Matrix vector product (SpMV) based on ELLR-T format is shown. Finally, a routine for computing the Sparse Matrix Matrix product (SpMM) is proposed and evaluated on GPUs platforms. The implementation of the Sparse Matrix Matrix product in this study has shown to outperform other existing approaches. Chapter 3 deals with the development of a method to solve large and sparse linear systems of equations on GPUs, namely, the BiConjugate Gradient Method (BCG). This chapter discusses the implementation of the BCG method on GPUs using two different approaches to compute the SpMV operation. These routines are: a routine included in the CUSPARSE library and an implementation of the SpMV based on ELLR-T format. At the end of the chapter the experimental results are provided from an extensive evaluation carried out using a set of test matrices. Experiments have shown that the implementation of the BCG based on ELLR-T outperforms the other approach. In Chapter 4 the Helmholtz Equation is explored. It is a particular case of Partial Differential Equation whose discretization results in a large and sparse linear system of equations. This linear system of equations is characterized by the regularities of the large and sparse matrix involved in the resolution. The features of the Helmholtz Equation are described at the beginning of the chapter. Later, several multi-GPU implementations proposed to accelerate the resolution of the 3D Helmholtz Equation are discussed. In Chapter 5 the development and implementation of a new tomographic reconstruction technique based on holography and HPC techniques and architectures is described. This new method requires the resolution of the 3D Helmholtz Equation, whose discretization results in a complex, regular, large and sparse linear system of equations. BCG is the proposed solver to obtain the solution of the Helmholtz Equation. As a result, the development of the model is based on the implementation of BCG on GPUs exploiting the regularities of the sparse matrix as described in Chapter 4. In Chapter 6 a summary where the main results are outlined is provided. Moreover, future lines of research are presented.

March 30, 2015 by hgpu