GPU Parallelization for Unstructured Sparse Matrix Problems with OpenMP 4.5 and OpenACC
University of Graz, 8010 Graz, Austria
University of Graz, SFB-Report No. 2017-010, 2017
@article{rosenberger2017gpu,
title={GPU Parallelization for Unstructured Sparse Matrix Problems with OpenMP 4.5 and OpenACC},
author={Rosenberger, S and Haase, G},
year={2017}
}
The effective use of parallelized hardware is an important goal of today’s computer developments. Nvidia GPUs are an important footing in this context. While CUDA implemented algorithms focus on detailed optimized usage of GPU elements the pragma directive parallelization targets GPU computation for a broader community. In this paper we focus on the implementation of OpenACC and OpenMP 4.5 parallelization for Nvidia GPUs for a sparse matrix solver on unstructured discretizations. We show similarities between these methods and current performance differences. We focus also on the possibilities to force pragma directive parallelized GPU code to a specific vectorization. Finally we demonstrate the performance of these methods in a complex structured C++ implementation of the CG and the GMRES method with an algebraic multigrid as preconditioner.
November 21, 2017 by hgpu