https://hgpu.org/?p=18273
Implementing general matrix-matrix multiplication algorithm on the Intel Xeon Phi Knights Landing Processor