Strassen’s Matrix Multiplication on GPUs
Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL, USA
IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS), 2011
@article{li2011strassen,
title={Strassen’s Matrix Multiplication on GPUs},
author={Li, J. and Ranka, S. and Sahni, S.},
year={2011}
}
We provide efficient single-precision and integer GPU implementations of Strassen’s algorithm as well as of Winograd’s variant. On an NVIDIA C1060 GPU, a speedup of 32% (35%) is obtained for Strassen’s 4-level implementation and 33% (36%) for Winograd’s variant relative to the sgemm (integer version of sgemm) code in CUBLAS 3.0 when multiplying 16384×16384 matrices. The maximum numerical error for the single-precision implementations is about 2 orders of magnitude higher than those for sgemm when n = 16384 and is zero for the integer implementations.
January 23, 2012 by hgpu