Parallel experiments with RARE-BLAS
Univ. Perpignan Via Domitia, Digits, Architectures et Logiciels Informatiques, F-66860, Perpignan
18th International Symposium on Symbolic and Numeric Algorithms, for Scientific Computing, 2016
@inproceedings{chohra:lirmm-01349698,
title={Parallel experiments with RARE-BLAS},
author={Chohra, Chemseddine and Langlois, Philippe and Parello, David},
url={http://hal-lirmm.ccsd.cnrs.fr/lirmm-01349698},
booktitle={SYNASC: Symbolic and Numeric Algorithms for Scientific Computing},
address={Timisoara, Romania},
year={2016},
month={Sep},
keywords={Numerical reproducibility; floating-point arithmetic; RARE-BLAS; BLAS},
pdf={http://hal-lirmm.ccsd.cnrs.fr/lirmm-01349698/file/SYNASC.pdf},
hal_id={lirmm-01349698},
hal_version={v1}
}
Numerical reproducibility failures rise in parallel computation because of the non-associativity of floating-point summation. Optimizations on massively parallel systems dynamically modify the floating-point operation order. Hence, numerical results may change from one run to another. We propose to ensure reproducibility by extending as far as possible the IEEE-754 correct rounding property to larger operation sequences. Our RARE-BLAS (Reproducible, Accurately Rounded and Efficient BLAS) benefits from recent accurate and efficient summation algorithms. Solutions for level 1 (asum, dot and nrm2) and level 2 (gemv) routines are provided. We compare their performance to the Intel MKL library and to other existing reproducible algorithms. For both shared and distributed memory parallel systems, we exhibit an extra-cost of 2x in the worst case scenario, which is satisfying for a wide range of applications. For Intel Xeon Phi accelerator a larger extra-cost (4x to 6x) is observed, which is still helpful at least for debugging and validation.
August 4, 2016 by hgpu