Accelerated Sparse Matrix Operations in Nonlinear Least Squares Solvers

hgpu.org » Programming » Algorithms » Accelerated Sparse Matrix Operations in Nonlinear Least Squares Solvers

Accelerated Sparse Matrix Operations in Nonlinear Least Squares Solvers

Lukas Polok

Department of Computer Graphics and Multimedia, Brno University of Technology

Brno University of Technology, 2017

@article{polok2017accelerated,

title={Accelerated Sparse Matrix Operations in Nonlinear Least Squares Solvers},

author={Polok, Luk{‘a}{v{s}}},

year={2017}

}

Download (PDF)

View

Source

2769

views

This thesis focuses on data structures for sparse block matrices and the associated algorithms for performing linear algebra operations that I have developed. Sparse block matrices occur naturally in many key problems, such as Nonlinear LEast Squares (NLS) on graphical models. NLS are used by e.g. Simultaneous Localization and Mapping (SLAM) in robotics, Bundle Adjustment (BA) or Structure from Motion (SfM) in computer vision. Sparse block matrices also occur when solving Finite Element Methods (FEMs) or Partial Differential Equations (PDEs) in physics simulations. The majority of the existing state of the art sparse linear algebra implementations use elementwise sparse matrices and only a small fraction of them support sparse block matrices. This is perhaps due to the complexity of sparse block formats which reduces computational efficiency, unless the blocks are very large. Some of the more specialized solvers in robotics and computer vision use sparse block matrices internally to reduce sparse matrix assembly costs, but finally end up converting such representation to an elementwise sparse matrix for the linear solver. Most of the existing sparse block matrix implementations focus only on a single operation, such as the matrix-vector product. The solution proposed in this thesis covers a broad range of functions: it includes efficient sparse block matrix assembly, matrix-vector and matrix-matrix products as well as triangular solving and Cholesky factorization. These operations can be used to construct both direct and iterative solvers as well as to compute eigenvalues. Highly efficient algorithms for both Central Processing Units (CPUs) and Graphics Processing Units (GPUs) are provided. The proposed solution is integrated in SLAM++, a nonlinear least squares solver focused on robotics and computer vision. It is evaluated on standard datasets where it proves to significantly outperform other similar state of the art implementations, without sacrificing generality or accuracy in any way.

Tags: Algorithms, Computer science, Differential equations, Factorization, FEM, Finite element method, Linear Algebra, nVidia, nVidia GeForce GTX 680, OpenCL, Partial differential equations, PDEs, Sparse matrix, Tesla K40, Thesis

December 19, 2017 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org