high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Implementing general matrix-matrix multiplication algorithm on the Intel Xeon Phi Knights Landing Processor

Implementing general matrix-matrix multiplication algorithm on the Intel Xeon Phi Knights Landing Processor

Raehyun Kim

Department of Mathematical Sciences, Seoul National University

Seoul National University, 2018

@phdthesis{kim2018implementing,

title={Implementing general matrix-matrix multiplication algorithm on the Intel Xeon Phi Knights Landing Processor},

author={Kim, Raehyun},

year={2018}

}

Download (PDF)

View

Source

1719

views

This paper presents the design and implementation of general matrix-matrix multiplication (GEMM) algorithm for the second generation Intel Xeon Phi processor codenamed Knights Landing (KNL). We illustrate several developing guidelines to achieve optimal performance with C programming language and the Advanced Vector Extensions (AVX-512) instruction set. Further, we present several environment variable issues associated with parallelization on the KNL. On a single core of the KNL, our double-precision GEMM (DGEMM) implementation achieves up to 99 percent of DGEMM performance using the Intel MKL, which is the current state-of-the-art library. Our parallel implementation for 68 cores of the KNL also achieves good scaling results, up to 93 percent of DGEMM performance using the Intel MKL.

Tags: Algorithms, Computer science, Intel Xeon Phi, Linear Algebra, Matrix multiplication, Optimization

June 13, 2018 by hgpu

Rating: 3.0/5. From 2 votes.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Implementing general matrix-matrix multiplication algorithm on the Intel Xeon Phi Knights Landing Processor

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

Implementing general matrix-matrix multiplication algorithm on the Intel Xeon Phi Knights Landing Processor

Share this:

Recent source codes

Most viewed papers (last 30 days)