Evaluation of DGEMM Implementation on Intel Xeon Phi Coprocessor
Intel Corporation, Pipers Way, Swindon Wiltshire SN3 1RJ, United Kingdom
Journal of Computers, Vol. 9, No. 7, 2014
@article{gepner2014evaluation,
title={Evaluation of DGEMM Implementation on Intel Xeon Phi Coprocessor},
author={Gepner, Pawel and Gamayunov, Victor and Fraser, David L. and Houdard, Eric and Sauge, Ludovic and Declat, Damien and Dubois, Mathieu},
year={2014}
}
In this paper we will present a detailed study of implementing double-precision matrix-matrix multiplication (DGEMM) utilizing the Intel Xeon Phi Coprocessor. We discuss a DGEMM algorithm implementation running "natively" on the coprocessor, minimizing communication with the host CPU. We will run DGEMM across a range of matrix sizes natively as well using Intel Math Kernel Library. Our optimizations were designed to support maximal reuse of on-die cache, which significantly reduces transfer from GDDR. Finally we analyze the improvement of a classic matrix multiplication implementation based on Cauchy algorithm compared to the latest results achieved using the Intel Math Kernel Library DGEMM subroutine.
June 28, 2014 by hgpu