FPGA and GPU implementation of large scale SpMV
Department of Electronic Engineering, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing 100084, China
IEEE 8th Symposium on Application Specific Processors (SASP), 2010
@inproceedings{shan2010fpga,
title={FPGA and GPU implementation of large scale SpMV},
author={Shan, Y. and Wu, T. and Wang, Y. and Wang, B. and Wang, Z. and Xu, N. and Yang, H.},
booktitle={Application Specific Processors (SASP), 2010 IEEE 8th Symposium on},
pages={64–70},
organization={IEEE},
year={2010}
}
Sparse matrix-vector multiplication (SpMV) is a fundamental operation for many applications. Many studies have been done to implement the SpMV on different platforms, while few work focused on the very large scale datasets with millions of dimensions. This paper addresses the challenges of implementing large scale SpMV with FPGA and GPU in the application of web link graph analysis. In the FPGA implementation, we designed the task partition and memory hierarchy according to the analysis of datasets scale and their access pattern. In the GPU implementation, we designed a fast and scalable SpMV routine with three passes, using a modified Compressed Sparse Row format. Results show that FPGA and GPU implementation achieves about 29x and 30x speedup on a StratixII EP2S180 FPGA and Radeon 5870 Graphic Card respectively compared with a Phenom 9550 CPU.
July 10, 2011 by hgpu