Implementing general matrix-matrix multiplication algorithm on the Intel Xeon Phi Knights Landing Processor
Department of Mathematical Sciences, Seoul National University
Seoul National University, 2018
@phdthesis{kim2018implementing,
title={Implementing general matrix-matrix multiplication algorithm on the Intel Xeon Phi Knights Landing Processor},
author={Kim, Raehyun},
year={2018}
}
This paper presents the design and implementation of general matrix-matrix multiplication (GEMM) algorithm for the second generation Intel Xeon Phi processor codenamed Knights Landing (KNL). We illustrate several developing guidelines to achieve optimal performance with C programming language and the Advanced Vector Extensions (AVX-512) instruction set. Further, we present several environment variable issues associated with parallelization on the KNL. On a single core of the KNL, our double-precision GEMM (DGEMM) implementation achieves up to 99 percent of DGEMM performance using the Intel MKL, which is the current state-of-the-art library. Our parallel implementation for 68 cores of the KNL also achieves good scaling results, up to 93 percent of DGEMM performance using the Intel MKL.
June 13, 2018 by hgpu