high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Scaleable Sparse Matrix-Vector Multiplication with Functional Memory and GPUs

Scaleable Sparse Matrix-Vector Multiplication with Functional Memory and GPUs

Noboru Tanabe, Yuuka Ogawa, Masami Takata, Kazuki Joe

19th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), 2011

DOI:10.1109/PDP.2011.92

@conference{tanabe2011scaleable,

title={Scaleable Sparse Matrix-Vector Multiplication with Functional Memory and GPUs},

author={Tanabe, N. and Ogawa, Y. and Takata, M. and Joe, K.},

booktitle={Parallel, Distributed and Network-Based Processing (PDP), 2011 19th Euromicro International Conference on},

pages={101–108},

issn={1066-6192},

organization={IEEE}

}

Source

1199

views

Sparse matrix-vector multiplication on GPUs faces to a serious problem when the vector length is too large to be stored in GPU’s device memory. To solve this problem, we propose a novel software-hardware hybrid method for a heterogeneous system with GPUs and functional memory modules connected by PCI express. The functional memory contains huge capacity of memory and provides scatter/gather operations. We perform some preliminary evaluation for the proposed method with using a sparse matrix benchmark collection. We observe that the proposed method for a GPU with converting indirect references to direct references without exhausting GPU’s cache memory achieves 4.1 times speedup compared with conventional methods. The proposed method intrinsically has high scalability of the number of GPUs because intercommunication among GPUs is completely eliminated. Therefore we estimate the performance of our proposed method would be expressed as the single GPU execution performance, which may be suppressed by the burst-transfer bandwidth of PCI express, multiplied with the number of GPUs.

Tags: Computer science, Heterogeneous systems, Matrix multiplication, Sparse matrix

April 14, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Scaleable Sparse Matrix-Vector Multiplication with Functional Memory and GPUs

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Scaleable Sparse Matrix-Vector Multiplication with Functional Memory and GPUs

Share this:

Recent source codes

Most viewed papers (last 30 days)