Preliminary Experiments with XKaapi on Intel Xeon Phi Coprocessor
Grenoble Institute of Technology, France
HAL: hal-00878325, (29 October 2013)
@inproceedings{lima2013preliminary,
title={Preliminary Experiments with XKaapi on Intel Xeon Phi Coprocessor},
author={Lima, Joao Vicente Ferreira and Broquedis, Francois and Gautier, Thierry and Raffin, Bruno and others},
booktitle={25th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)},
year={2013}
}
This paper presents preliminary performance comparisons of parallel applications developed natively for the Intel Xeon Phi accelerator using three different parallel programming environments and their associated runtime systems. We compare Intel OpenMP, Intel CilkPlus and XKaapi together on the same benchmark suite and we provide comparisons between an Intel Xeon Phi coprocessor and a Sandy Bridge Xeon-based machine. Our benchmark suite is composed of three computing kernels: a Fibonacci computation that allows to study the overhead and the scalability of the runtime system, a NQueens application generating irregular and dynamic tasks and a Cholesky factorization algorithm. We also compare the Cholesky factorization with the parallel algorithm provided by the Intel MKL library for Intel Xeon Phi. Performance evaluation shows our XKaapi data-flow parallel programming environment exposes the lowest overhead of all and is highly competitive with native OpenMP and CilkPlus environments on Xeon Phi. Moreover, the efficient handling of data-flow dependencies between tasks makes our XKaapi environment exhibit more parallelism for some applications such as the Cholesky factorization. In that case, we observe substantial gains with up to 180 hardware threads over the state of the art MKL, with a 47% performance increase for 60 hardware threads.
November 11, 2013 by hgpu