high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » A unified sparse matrix data format for modern processors with wide SIMD units

A unified sparse matrix data format for modern processors with wide SIMD units

Moritz Kreutzer, Georg Hager, Gerhard Wellein, Holger Fehske, Alan R. Bishop

Erlangen Regional Computing Center, Friedrich-Alexander Universitat, Erlangen-Nurnberg, D-91058 Erlangen, Germany

arXiv:1307.6209 [cs.MS], (23 Jul 2013)

@article{2013arXiv1307.6209K,

author={Kreutzer}, M. and {Hager}, G. and {Wellein}, G. and {Fehske}, H. and {Bishop}, A.~R.},

title={"{A unified sparse matrix data format for modern processors with wide SIMD units}"},

journal={ArXiv e-prints},

archivePrefix={"arXiv"},

eprint={1307.6209},

primaryClass={"cs.MS"},

keywords={Computer Science – Mathematical Software, Computer Science – Distributed, Parallel, and Cluster Computing},

year={2013},

month={jul},

adsurl={http://adsabs.harvard.edu/abs/2013arXiv1307.6209K},

adsnote={Provided by the SAO/NASA Astrophysics Data System}

}

Download (PDF)

View

Source

2185

views

Sparse matrix-vector multiplication (spMVM) is the most time-consuming kernel in many numerical algorithms and has been studied extensively on all modern processor and accelerator architectures. However, the optimal sparse matrix data storage format is highly hardware-specific, which could become an obstacle when using heterogeneous systems. Also, it is as yet unclear how the wide single instruction multiple data (SIMD) units in current multi- and many-core processors should be used most efficiently if there is no structure in the sparsity pattern of the matrix. We suggest SELL-C-sigma, a variant of Sliced ELLPACK, as a SIMD-friendly data format which combines long-standing ideas from General Purpose Graphics Processing Units (GPGPUs) and vector computer programming. We discuss the advantages of SELL-C-sigma compared to established formats like Compressed Row Storage (CRS) and ELLPACK, and show its suitability on a variety of hardware platforms (Intel Sandy Bridge, Intel Xeon Phi and Nvidia Tesla K20) for a wide range of test matrices from different application areas. Using appropriate performance models we develop deep insight into the data transfer properties of the SELL-C-sigma spMVM kernel. SELL-C-sigma comes with two tuning parameters whose performance impact across the range of test matrices is studied and for which reasonable choices are proposed. This leads to a hardware-independent ("catch-all") sparse matrix format, which achieves very high efficiency for all test matrices across all hardware platforms.

Tags: Algorithms, Computer science, CUDA, Heterogeneous systems, Intel Phi, Mathematical Software, nVidia, Sparse matrix, Tesla K20

July 25, 2013 by hgpu

No votes yet.

Please wait...