high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Implementing a Sparse Matrix Vector Product for the SELL-C/SELL-C-sigma formats on NVIDIA GPUs

Implementing a Sparse Matrix Vector Product for the SELL-C/SELL-C-sigma formats on NVIDIA GPUs

Hartwig Anzt, Stanimire Tomov, Jack Dongarra

Innovative Computing Lab, University of Tennessee, Knoxville, USA

Innovative Computing Lab, University of Tennessee, Technical report ut-eecs-14-727, 2014

BibTeX

Download (PDF)

View

Source

2498

views

Numerical methods in sparse linear algebra typically rely on a fast and efficient matrix vector product, as this usually is the backbone of iterative algorithms for solving eigenvalue problems or linear systems. Against the background of a large diversity in the characteristics of high performance computer architectures, it is a challenge to derive a cross-platform efficient storage format along with fast matrix vector kernels. Recently, attention focused on the SELL-C-sigma format, a sliced ELLPACK format enhanced by row-sorting to reduce the fill in when padding rows with zeros. In this paper we propose an additional modification resulting in the padded sliced ELLPACK (SELLP) format, for which we develop a sparse matrix vector CUDA kernel that is able to efficiently exploit the computing power of NVIDIA GPUs. We show that the kernel we developed outperforms straight-forward implementations for the widespread CSR and ELLPACK formats, and is highly competitive to the implementations in the highly optimized CUSPARSE library.

Tags: Algorithms, Computer science, CUDA, Linear Algebra, nVidia, Sorting, Sparse matrix, Tesla K40

April 7, 2014 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Implementing a Sparse Matrix Vector Product for the SELL-C/SELL-C-sigma formats on NVIDIA GPUs

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

Implementing a Sparse Matrix Vector Product for the SELL-C/SELL-C-sigma formats on NVIDIA GPUs

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)