high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A Memory Efficient and Fast Sparse Matrix Vector Product on a GPU

A Memory Efficient and Fast Sparse Matrix Vector Product on a GPU

A. Dziekonski, A. Lamecki, M. Mrozowski

WiComm Center of Excellence, Faculty of Electronics, Telecommunications and Informatics (ETI), Gdansk University of Technology (GUT), Gdansk 80-233, Poland

Progress In Electromagnetics Research, Vol. 116, p.49-63, 2011

DOI:10.2528/PIER11031607

@article{dziekonski2011memory,

title={A Memory Efficient and Fast Sparse Matrix Vector Product on a Gpu},

author={Dziekonski, A. and Lamecki, A. and Mrozowski, M.},

journal={Progress In Electromagnetics Research},

volume={116},

pages={49–63},

year={2011},

publisher={EMW Publishing}

}

Download (PDF)

View

Source

2149

views

This paper proposes a new sparse matrix storage format which allows an efficient implementation of a sparse matrix vector product on a Fermi Graphics Processing Unit (GPU). Unlike previous formats it has both low memory footprint and good throughput. The new format, which we call Sliced ELLR-T has been designed specifically for accelerating the iterative solution of a large sparse and complex-valued system of linear equations arising in computational electromagnetics. Numerical tests have shown that the performance of the new implementation reaches 69 GFLOPS in complex single precision arithmetic. Compared to the optimized six core Central Processing Unit (CPU) (Intel Xeon 5680) this performance implies a speedup by a factor of six. In terms of speed the new format is as fast as the best format published so far and at the same time it does not introduce redundant zero elements which have to be stored to ensure fast memory access. Compared to previously published solutions, significantly larger problems can be handled using low cost commodity GPUs with limited amount of on-board memory.

Tags: Computer science, CUDA, Electrodynamics, FEM, Finite element method, nVidia, nVidia GeForce GTX 480, Performance, Sparse matrix

October 26, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

A Memory Efficient and Fast Sparse Matrix Vector Product on a GPU

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

A Memory Efficient and Fast Sparse Matrix Vector Product on a GPU

Share this:

Recent source codes

Most viewed papers (last 30 days)