Strassen’s Matrix Multiplication on GPUs

Junjie Li, Sanjay Ranka, Sartaj Sahni
Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL, USA
IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS), 2011


   title={Strassen’s Matrix Multiplication on GPUs},

   author={Li, J. and Ranka, S. and Sahni, S.},



Download Download (PDF)   View View   Source Source   



We provide efficient single-precision and integer GPU implementations of Strassen’s algorithm as well as of Winograd’s variant. On an NVIDIA C1060 GPU, a speedup of 32% (35%) is obtained for Strassen’s 4-level implementation and 33% (36%) for Winograd’s variant relative to the sgemm (integer version of sgemm) code in CUBLAS 3.0 when multiplying 16384×16384 matrices. The maximum numerical error for the single-precision implementations is about 2 orders of magnitude higher than those for sgemm when n = 16384 and is zero for the integer implementations.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: