Selecting the Best Tridiagonal System Solver Projected on Multi-Core CPU and GPU Platforms
Centro de Investigacion en Tecnoloxias da Informacion (CITIUS), Univ. of Santiago de Compostela, Spain
The 2011 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’11), 2011
@article{quesada2011selecting,
title={Selecting the Best Tridiagonal System Solver Projected on Multi-Core CPU and GPU Platforms},
author={Quesada-Barriuso, Pablo and Lamas-Rodriguez, Julian and Heras, Dora B. and Boo, Montserrat and Arguello, Francisco},
booktitle={The 2011 International Conference on Parallel and Distributed Processing Techniques and Applications},
year={2011}
}
Nowadays multicore processors and graphics cards are commodity hardware that can be found in personal computers. Both CPU and GPU are capable of performing high-end computations. In this paper we present and compare parallel implementations of two tridiagonal system solvers. We analyze the cyclic reduction method, as an example of fine-grained parallelism, and Bondeli’s algorithm, as a coarse-grained example of parallelism. Both algorithms are implemented for GPU architectures using CUDA and multi-core CPU with shared memory architectures using OpenMP. The results are compared in terms of execution time, speedup, and GFLOPS. For a large system of equations, 2^22, the best results were obtained for Bondeli’s algorithm (speedup 1.55x and 0.84 GFLOPS) for multi-core CPU platforms while the cyclic reduction (speedup 17.06x and 5.09 GFLOPS) was the best for the case of GPU platforms.
January 5, 2012 by hgpu