8893

Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi

Erik Saule, Kamer Kaya, Umit V. Catalyurek
The Ohio State University, Department of Biomedical Informatics
arXiv:1302.1078 [cs.PF], (5 Feb 2013)
@article{2013arXiv1302.1078S,

   author={Saule}, E. and {Kaya}, K. and {Catalyurek}, U.~V.},

   title={"{Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi}"},

   journal={ArXiv e-prints},

   archivePrefix={"arXiv"},

   eprint={1302.1078},

   primaryClass={"cs.PF"},

   keywords={Computer Science – Performance, Computer Science – Hardware Architecture},

   year={2013},

   month={feb},

   adsurl={http://adsabs.harvard.edu/abs/2013arXiv1302.1078S},

   adsnote={Provided by the SAO/NASA Astrophysics Data System}

}

Download Download (PDF)   View View   Source Source   

1307

views

Intel Xeon Phi is a recently released high-performance coprocessor which features 61 cores each supporting 4 hardware threads with 512-bit wide SIMD registers achieving a peak theoretical performance of 1Tflop/s in double precision. Many scientific applications involve operations on large sparse matrices such as linear solvers, eigensolver, and graph mining algorithms. The core of most of these applications involves the multiplication of a large, sparse matrix with a dense vector (SpMV). In this paper, we investigate the performance of the Xeon Phi coprocessor for SpMV. We first provide a comprehensive introduction to this new architecture and analyze its peak performance with a number of micro benchmarks. Although the design of a Xeon Phi core is not much different than those of the cores in modern processors, its large number of cores and hyperthreading capability allow many application to saturate the available memory bandwidth, which is not the case for many cutting-edge processors. Yet, our performance studies show that it is the memory latency not the bandwidth which creates a bottleneck for SpMV on this architecture. Finally, our experiments show that Xeon Phi’s sparse kernel performance is very promising and even better than that of cutting-edge general purpose processors and GPUs.
VN:F [1.9.22_1171]
Rating: 3.0/5 (2 votes cast)
Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi, 3.0 out of 5 based on 2 ratings

* * *

* * *

Like us on Facebook

HGPU group

193 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1329 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: AMD APP SDK 2.9
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 6.0.1, AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2014 hgpu.org

All rights belong to the respective authors

Contact us: