Fast and Practical Strassen’s Matrix Multiplication using FPGAs
Department of Electronics and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong
arXiv:2406.02088 [cs.AR], (arXiv:2406.02088 [cs.AR])
@misc{ahmad2024fast,
title={Fast and Practical Strassen’s Matrix Multiplication using FPGAs},
author={Afzal Ahmad and Linfeng Du and Wei Zhang},
year={2024},
eprint={2406.02088},
archivePrefix={arXiv},
primaryClass={cs.AR}
}
Matrix multiplication is a cornerstone operation in a wide array of scientific fields, including machine learning and computer graphics. The standard algorithm for matrix multiplication has a complexity of O(n3) for n×n matrices. Strassen’s algorithm improves this to O(n2.807), but its practicality is limited for small to medium matrix sizes due to the large number of additions it introduces. This paper presents a novel FPGA-based implementation of Strassen’s algorithm that achieves superior speed over an optimized General Matrix Multiply (GeMM) implementation for matrices as small as n=256. Our design, tested extensively on two high-performance FPGA accelerators (Alveo U50 and U280) across various data types, matches or surpasses the performance of a highly optimized baseline across a range of matrix sizes.
June 9, 2024 by hgpu