Parallel experiments with RARE-BLAS

hgpu.org » Programming » Algorithms » Parallel experiments with RARE-BLAS

Parallel experiments with RARE-BLAS

Chemseddine Chohra, Philippe Langlois, David Parello

Univ. Perpignan Via Domitia, Digits, Architectures et Logiciels Informatiques, F-66860, Perpignan

18th International Symposium on Symbolic and Numeric Algorithms, for Scientific Computing, 2016

BibTeX

Download (PDF)

View

Source

2089

views

Numerical reproducibility failures rise in parallel computation because of the non-associativity of floating-point summation. Optimizations on massively parallel systems dynamically modify the floating-point operation order. Hence, numerical results may change from one run to another. We propose to ensure reproducibility by extending as far as possible the IEEE-754 correct rounding property to larger operation sequences. Our RARE-BLAS (Reproducible, Accurately Rounded and Efficient BLAS) benefits from recent accurate and efficient summation algorithms. Solutions for level 1 (asum, dot and nrm2) and level 2 (gemv) routines are provided. We compare their performance to the Intel MKL library and to other existing reproducible algorithms. For both shared and distributed memory parallel systems, we exhibit an extra-cost of 2x in the worst case scenario, which is satisfying for a wide range of applications. For Intel Xeon Phi accelerator a larger extra-cost (4x to 6x) is observed, which is still helpful at least for debugging and validation.

Tags: Algorithms, BLAS, Computer science, Intel Xeon Phi, Linear Algebra

August 4, 2016 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org