high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Fast and Practical Strassen’s Matrix Multiplication using FPGAs

Fast and Practical Strassen’s Matrix Multiplication using FPGAs

Afzal Ahmad, Linfeng Du, Wei Zhang

Department of Electronics and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong

arXiv:2406.02088 [cs.AR], (arXiv:2406.02088 [cs.AR])

DOI:10.48550/arXiv.2406.02088

BibTeX

Download (PDF)

View

Source

Source codes

Package:

Fast and Practical FPGA-based Strassen’s Matrix Multiplication

1572

views

Matrix multiplication is a cornerstone operation in a wide array of scientific fields, including machine learning and computer graphics. The standard algorithm for matrix multiplication has a complexity of O(n3) for n×n matrices. Strassen’s algorithm improves this to O(n2.807), but its practicality is limited for small to medium matrix sizes due to the large number of additions it introduces. This paper presents a novel FPGA-based implementation of Strassen’s algorithm that achieves superior speed over an optimized General Matrix Multiply (GeMM) implementation for matrices as small as n=256. Our design, tested extensively on two high-performance FPGA accelerators (Alveo U50 and U280) across various data types, matches or surpasses the performance of a highly optimized baseline across a range of matrix sizes.

Tags: BLAS, Computer science, FPGA, GEMM, Linear Algebra, Machine learning, Matrix multiplication, OpenCL, Package

June 9, 2024 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Fast and Practical Strassen’s Matrix Multiplication using FPGAs

Package:

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

Fast and Practical Strassen’s Matrix Multiplication using FPGAs

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)