Evaluation of DGEMM Implementation on Intel Xeon Phi Coprocessor
Intel Corporation, Pipers Way, Swindon Wiltshire SN3 1RJ, United Kingdom
Journal of Computers, Vol. 9, No. 7, 2014
In this paper we will present a detailed study of implementing double-precision matrix-matrix multiplication (DGEMM) utilizing the Intel Xeon Phi Coprocessor. We discuss a DGEMM algorithm implementation running "natively" on the coprocessor, minimizing communication with the host CPU. We will run DGEMM across a range of matrix sizes natively as well using Intel Math Kernel Library. Our optimizations were designed to support maximal reuse of on-die cache, which significantly reduces transfer from GDDR. Finally we analyze the improvement of a classic matrix multiplication implementation based on Cauchy algorithm compared to the latest results achieved using the Intel Math Kernel Library DGEMM subroutine.
June 28, 2014 by hgpu