high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Evaluation of DGEMM Implementation on Intel Xeon Phi Coprocessor

Evaluation of DGEMM Implementation on Intel Xeon Phi Coprocessor

Pawel Gepner, Victor Gamayunov, David L. Fraser, Eric Houdard, Ludovic Sauge, Damien Declat, Mathieu Dubois

Intel Corporation, Pipers Way, Swindon Wiltshire SN3 1RJ, United Kingdom

Journal of Computers, Vol. 9, No. 7, 2014

DOI:10.4304/jcp.9.7.1566-1571

@article{gepner2014evaluation,

title={Evaluation of DGEMM Implementation on Intel Xeon Phi Coprocessor},

author={Gepner, Pawel and Gamayunov, Victor and Fraser, David L. and Houdard, Eric and Sauge, Ludovic and Declat, Damien and Dubois, Mathieu},

year={2014}

}

Download (PDF)

View

Source

1772

views

In this paper we will present a detailed study of implementing double-precision matrix-matrix multiplication (DGEMM) utilizing the Intel Xeon Phi Coprocessor. We discuss a DGEMM algorithm implementation running "natively" on the coprocessor, minimizing communication with the host CPU. We will run DGEMM across a range of matrix sizes natively as well using Intel Math Kernel Library. Our optimizations were designed to support maximal reuse of on-die cache, which significantly reduces transfer from GDDR. Finally we analyze the improvement of a classic matrix multiplication implementation based on Cauchy algorithm compared to the latest results achieved using the Intel Math Kernel Library DGEMM subroutine.

Tags: Algorithms, Computer science, Intel Xeon Phi, Linear Algebra, Matrix multiplication, Performance

June 28, 2014 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Evaluation of DGEMM Implementation on Intel Xeon Phi Coprocessor

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Evaluation of DGEMM Implementation on Intel Xeon Phi Coprocessor

Share this:

Recent source codes

Most viewed papers (last 30 days)