high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Scaleable Sparse Matrix-Vector Multiplication with Functional Memory and GPUs

Scaleable Sparse Matrix-Vector Multiplication with Functional Memory and GPUs

Noboru Tanabe, Yuuka Ogawa, Masami Takata, Kazuki Joe

19th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), 2011

DOI:10.1109/PDP.2011.92

BibTeX

Source

1544

views

Sparse matrix-vector multiplication on GPUs faces to a serious problem when the vector length is too large to be stored in GPU’s device memory. To solve this problem, we propose a novel software-hardware hybrid method for a heterogeneous system with GPUs and functional memory modules connected by PCI express. The functional memory contains huge capacity of memory and provides scatter/gather operations. We perform some preliminary evaluation for the proposed method with using a sparse matrix benchmark collection. We observe that the proposed method for a GPU with converting indirect references to direct references without exhausting GPU’s cache memory achieves 4.1 times speedup compared with conventional methods. The proposed method intrinsically has high scalability of the number of GPUs because intercommunication among GPUs is completely eliminated. Therefore we estimate the performance of our proposed method would be expressed as the single GPU execution performance, which may be suppressed by the burst-transfer bandwidth of PCI express, multiplied with the number of GPUs.

Tags: Computer science, Heterogeneous systems, Matrix multiplication, Sparse matrix

April 14, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Scaleable Sparse Matrix-Vector Multiplication with Functional Memory and GPUs

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

Scaleable Sparse Matrix-Vector Multiplication with Functional Memory and GPUs

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)