Parallel experiments with RARE-BLAS

Chemseddine Chohra, Philippe Langlois, David Parello
Univ. Perpignan Via Domitia, Digits, Architectures et Logiciels Informatiques, F-66860, Perpignan
18th International Symposium on Symbolic and Numeric Algorithms, for Scientific Computing, 2016


   title={Parallel experiments with RARE-BLAS},

   author={Chohra, Chemseddine and Langlois, Philippe and Parello, David},


   booktitle={SYNASC: Symbolic and Numeric Algorithms for Scientific Computing},

   address={Timisoara, Romania},



   keywords={Numerical reproducibility; floating-point arithmetic; RARE-BLAS; BLAS},





Download Download (PDF)   View View   Source Source   



Numerical reproducibility failures rise in parallel computation because of the non-associativity of floating-point summation. Optimizations on massively parallel systems dynamically modify the floating-point operation order. Hence, numerical results may change from one run to another. We propose to ensure reproducibility by extending as far as possible the IEEE-754 correct rounding property to larger operation sequences. Our RARE-BLAS (Reproducible, Accurately Rounded and Efficient BLAS) benefits from recent accurate and efficient summation algorithms. Solutions for level 1 (asum, dot and nrm2) and level 2 (gemv) routines are provided. We compare their performance to the Intel MKL library and to other existing reproducible algorithms. For both shared and distributed memory parallel systems, we exhibit an extra-cost of 2x in the worst case scenario, which is satisfying for a wide range of applications. For Intel Xeon Phi accelerator a larger extra-cost (4x to 6x) is observed, which is still helpful at least for debugging and validation.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: