high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi

Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi

Erik Saule, Kamer Kaya, Umit V. Catalyurek

The Ohio State University, Department of Biomedical Informatics

arXiv:1302.1078 [cs.PF], (5 Feb 2013)

@article{2013arXiv1302.1078S,

author={Saule}, E. and {Kaya}, K. and {Catalyurek}, U.~V.},

title={"{Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi}"},

journal={ArXiv e-prints},

archivePrefix={"arXiv"},

eprint={1302.1078},

primaryClass={"cs.PF"},

keywords={Computer Science – Performance, Computer Science – Hardware Architecture},

year={2013},

month={feb},

adsurl={http://adsabs.harvard.edu/abs/2013arXiv1302.1078S},

adsnote={Provided by the SAO/NASA Astrophysics Data System}

}

Download (PDF)

View

Source

3549

views

Intel Xeon Phi is a recently released high-performance coprocessor which features 61 cores each supporting 4 hardware threads with 512-bit wide SIMD registers achieving a peak theoretical performance of 1Tflop/s in double precision. Many scientific applications involve operations on large sparse matrices such as linear solvers, eigensolver, and graph mining algorithms. The core of most of these applications involves the multiplication of a large, sparse matrix with a dense vector (SpMV). In this paper, we investigate the performance of the Xeon Phi coprocessor for SpMV. We first provide a comprehensive introduction to this new architecture and analyze its peak performance with a number of micro benchmarks. Although the design of a Xeon Phi core is not much different than those of the cores in modern processors, its large number of cores and hyperthreading capability allow many application to saturate the available memory bandwidth, which is not the case for many cutting-edge processors. Yet, our performance studies show that it is the memory latency not the bandwidth which creates a bottleneck for SpMV on this architecture. Finally, our experiments show that Xeon Phi’s sparse kernel performance is very promising and even better than that of cutting-edge general purpose processors and GPUs.

Tags: Computer science, CUDA, Hardware Architecture, Intel, Intel Phi, Matrix multiplication, nVidia, Performance, Sparse matrix, Tesla C2050, Tesla K20

February 6, 2013 by hgpu

Rating: 1.5/5. From 2 votes.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi

Share this:

Recent source codes

Most viewed papers (last 30 days)