high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Reduce, Reuse, Recycle (R^3): a Design Methodology for Sparse Matrix Vector Multiplication on Reconfigurable Platforms

Reduce, Reuse, Recycle (R^3): a Design Methodology for Sparse Matrix Vector Multiplication on Reconfigurable Platforms

Kevin Townsend, Joseph Zambreno

Department of Electrical and Computer Engineering, Iowa State University, Ames, IA, USA

International Conference on Application-specific Systems, Architectures and Processors (ASAP), 2013

@conference{TowZam13A,

title={Reduce, Reuse, Recycle (R^3): a Design Methodology for Sparse Matrix Vector Multiplication on Reconfigurable Platforms},

booktitle={Proceedings of the International Conference on Application-specific Systems, Architectures and Processors (ASAP)},

year={2013},

month={June},

author={Kevin Townsend and Joseph Zambreno}

}

Download (PDF)

View

Source

1376

views

Sparse Matrix Vector Multiplication (SpMV) is an important computational kernel in many scientific computing applications. Pipelining multiply-accumulate operations shifts SpMV from a computationally bounded kernel to an I/O bounded kernel. In this paper, we propose a design methodology and hardware architecture for SpMV that seeks to utilize system memory bandwidth as efficiently as possible, by Reducing the matrix element storage with on-chip decompression hardware, Reusing the vector data by mixing row and column matrix traversal, and Recycling data with matrix-dependent on-chip storage. Our experimental results with a Convey HC-1/HC-2 reconfigurable computing system indicate that for certain sparse matrices, our R^3 methodology performs twice as fast as previous reconfigurable implementations, and effectively competes against other platforms.

Tags: Computer science, CUDA, nVidia, Sparse matrix, Tesla M2090, Tesla S1070

April 23, 2013 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Reduce, Reuse, Recycle (R^3): a Design Methodology for Sparse Matrix Vector Multiplication on Reconfigurable Platforms

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Reduce, Reuse, Recycle (R^3): a Design Methodology for Sparse Matrix Vector Multiplication on Reconfigurable Platforms

Share this:

Recent source codes

Most viewed papers (last 30 days)