CUDA implementation of the solution of a system of linear equations arising in an hp-Finite Element code
Departamento de Matematica Aplicada y Estadistica e I.O., University of the Basque Country UPV/EHU, and Ikerbasque
Universidad de Zaragoza, 2013
@article{villanueva2013cuda,
title={CUDA implementation of the solution of a system of linear equations arising in an hp-Finite Element code.},
author={Villanueva, Javier Os{‘e}s},
year={2013}
}
The FEM has proven to be one of the most efficient methods for solving differential equations. Designed to run on different computer architectures, technological improvements have led over the years to the fast solution of larger and larger problems. Among these technological improvements, we emphasize the development of GPU (Graphic Processor Unit). Scientific programming in graphics cards was extremely difficult until 2006 the company NVIDIA developed CUDA (Compute Unified Device Architecture). It is a programming language designed for generic computing which does not require knowledge of traditional graphics programming. GPUs are capable of performing a large number of operations simultaneously. This capability makes them very attractive for use in FEM. One of the parts of the FEM which requires large computational capacity is the solution of systems of linear equations. In this work, an algorithm for solving systems of linear equations in CUDA has been implemented. It will be applied as a part of a hp-FEM code that tries to solve Laplace equation. The aim of this study is to compare the performance of an an implementation of a solver in CUDA vs. a C implementation and check if CUDA has advantages over traditional programming. For that purpose, we select an algorithm suitable for GPU programming. The iterative algorithms have properties that fits to CUDA programming architecture. However, the use of these algorithms require from double precision arithmetic to minimize round-off effects. Nowadays, only high performance GPUs are able to work in double precision. FEM matrices are sparse and the use of compression format for the system matrix is needed. Exist multiple compression formats and we select one which better fits to the matrix structure that FEM generates in our problem. The implementation in CUDA introduces improvements in execution times compared to traditional programming in C. Recent works has proved that it can be obtained programs that works until 80 times faster. But, this result can not be generalized because the improvements depends on differential equation, boundary conditions, mesh generation, FEM, model of GPU, version of CUDA(now 5.0), and of course implementation.
May 15, 2013 by hgpu