high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A Sparse Matrix Personality for the Convey HC-1

A Sparse Matrix Personality for the Convey HC-1

Krishna K. Nagar, Jason D. Bakos

Dept. of Computer Science and Engineering, University of South Carolina, Columbia, SC USA

IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2011

DOI:10.1109/FCCM.2011.60

@article{nagarsparse,

title={A Sparse Matrix Personality for the Convey HC-1},

author={Nagar, K.K. and Bakos, J.D.},

booktitle={IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2011},

year={2011}

}

Download (PDF)

View

Source

1471

views

In this paper we describe a double precision floating point sparse matrix-vector multiplier (SpMV) and its performance as implemented on a Convey HC-1 reconfigurable computer. The primary contributions of this work are a novel streaming reduction architecture for floating point accumulation, a novel on-chip cache optimized for streaming compressed sparse row (CSR) matrices, and end-to-end integration with the HC-1’s system, programming model, and runtime environment. The design is composed of 32 parallel processing elements, each connected to the HC-1’s coprocessor memory and each containing a streaming multiply-accumulator and local vector cache. When used on the HC-1, each PE has a peak throughput of 300 double precision MFLOP/s, giving a total peak throughput of 9.6 GFLOPS/s. For our test matrices, we demonstrate up to 40% of the peak performance and compare these results with results obtained using the CUSparse library on an NVIDIA Tesla S1070 GPU. In most cases our implementation exceeds the performance of the GPU.

Tags: Computer science, CUDA, FPGA, nVidia, Performance, Sparse matrix, Tesla S1070

July 14, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

A Sparse Matrix Personality for the Convey HC-1

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

A Sparse Matrix Personality for the Convey HC-1

Share this:

Recent source codes

Most viewed papers (last 30 days)