high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Shuffle Reduction Based Sparse Matrix-Vector Multiplication on Kepler GPU

Shuffle Reduction Based Sparse Matrix-Vector Multiplication on Kepler GPU

Yuan Tao, Huang Zhi-Bin

College of Mathematics, Jilin Normal University, Siping Jilin 136000, China

International Journal of Grid and Distributed Computing, Vol. 9, No. 10, pp.99-106, 2016

DOI:10.14257/ijgdc.2016.9.10.09

BibTeX

Download (PDF)

View

Source

1825

views

GPU is the suitable equipment for accelerating computing-intensive applications in order to get the higher throughput for High Performance Computing (HPC). Sparse Matrix-Vector Multiplication (SpMV) is the core algorithm of HPC, so the SpMV’s throughput on GPU may affect the throughput on HPC platform. In the paper, we focus on the latency of reduction routine in SpMV included in CUSP, such as accessing shared memory and bank conflicting while multiple threads simultaneously accessing the same bank. We provide shuffle method to reduce the partial results instead of reducing in the shared memory in order to improve the throughput of SpMV on Kepler GPU. Experiments show that shuffle method can improve the throughput up to 9% of the original routine of SpMV in CUSP on average.

Tags: Algorithms, Computer science, CUDA, nVidia, Sparse matrix, Tesla K20

November 10, 2016 by hgpu

Rating: 0.5/5. From 1 vote.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Shuffle Reduction Based Sparse Matrix-Vector Multiplication on Kepler GPU

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Most viewed papers (last 30 days)

Shuffle Reduction Based Sparse Matrix-Vector Multiplication on Kepler GPU

Share this:

Recent source codes

Most viewed papers (last 30 days)