high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Shuffle Reduction Based Sparse Matrix-Vector Multiplication on Kepler GPU

Shuffle Reduction Based Sparse Matrix-Vector Multiplication on Kepler GPU

Yuan Tao, Huang Zhi-Bin

College of Mathematics, Jilin Normal University, Siping Jilin 136000, China

International Journal of Grid and Distributed Computing, Vol. 9, No. 10, pp.99-106, 2016

DOI:10.14257/ijgdc.2016.9.10.09

@article{tao2016shuffle,

title={Shuffle Reduction Based Sparse Matrix-Vector Multiplication on Kepler GPU},

author={Tao, Yuan and Zhi-Bin, Huang},

year={2016}

}

Download (PDF)

View

Source

1465

views

GPU is the suitable equipment for accelerating computing-intensive applications in order to get the higher throughput for High Performance Computing (HPC). Sparse Matrix-Vector Multiplication (SpMV) is the core algorithm of HPC, so the SpMV’s throughput on GPU may affect the throughput on HPC platform. In the paper, we focus on the latency of reduction routine in SpMV included in CUSP, such as accessing shared memory and bank conflicting while multiple threads simultaneously accessing the same bank. We provide shuffle method to reduce the partial results instead of reducing in the shared memory in order to improve the throughput of SpMV on Kepler GPU. Experiments show that shuffle method can improve the throughput up to 9% of the original routine of SpMV in CUSP on average.

Tags: Algorithms, Computer science, CUDA, nVidia, Sparse matrix, Tesla K20

November 10, 2016 by hgpu

Rating: 0.5/5. From 1 vote.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Shuffle Reduction Based Sparse Matrix-Vector Multiplication on Kepler GPU

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Shuffle Reduction Based Sparse Matrix-Vector Multiplication on Kepler GPU

Share this:

Recent source codes

Most viewed papers (last 30 days)