Performance and numerical accuracy evaluation of heterogeneous multicore systems for Krylov orthogonal basis computation
Commissariat l’Energie Atomique, CEA-Saclay/DEN/DANS/DM2S/SERMA/LLPR F-91191 Gif-sur-Yvette Cedex, France
High Performance Computing for Computational Science – VECPAR 2010, Lecture Notes in Computer Science, Volume 6449/2011, 45-57, 2011
@article{dubois2011performance,
title={Performance and numerical accuracy evaluation of heterogeneous multicore systems for Krylov orthogonal basis computation},
author={Dubois, J. and Calvin, C. and Petiton, S.},
journal={High Performance Computing for Computational Science–VECPAR 2010},
pages={45–57},
year={2011},
publisher={Springer}
}
We study the numerical behavior of heterogeneous systems such as CPU with GPU or IBM Cell processors for some orthogonalization processes. We focus on the influence of the different floating arithmetic handling of these accelerators with Gram-Schmidt orthogonalization using single and double precision. We observe for dense matrices a loss of at worst 1 digit for CUDA-enabled GPUs as well as a speed-up of 20x, and 2 digits for the Cell processor for a 7x speed-up. For sparse matrices, the result between CPU and GPU is very close and the speed-up is 10x. We conclude that the Cell processor is a good accelerator for double precision because of its full IEEE compliance, and not sufficient for single precision applications. The GPU speed-up is better than Cell and the decent IEEE support delivers results close to the CPU ones for both precisions.
November 30, 2011 by hgpu