Autotuning GEMMs for Fermi

Jakub Kurzak, Stanimire Tomov, Jack Dongarra
Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN, USA
University of Tennessee, Computer Science Technical report UT-CS-11-671, 2011


   title={Autotuning GEMMs for Fermi},

   author={Kurzak, J. and Tomov, S. and Dongarra, J.},



Download Download (PDF)   View View   Source Source   Source codes Source codes




In recent years, the use of graphics chips has been recognized as a viable way of accelerating scientific and engineering applications, even more so since the introduction of the Fermi architecture by NVIDIA, with features essential to numerical computing, such as fast double precision arithmetic and memory protected with error correction codes. Being the crucial component of numerical software packages, such as LAPACK and ScaLAPACK, the general dense matrix multiplication routine is one of the more important workloads to be implemented on these devices. This article presents a methodology for producing matrix multiplication kernels tuned for a specific architecture, through a canonical process of heuristic autotuning, based on generation of multiple code variants and selecting the fastest ones through benchmarking. The key contribution of this work is in the method for generating the search space; specifically, pruning it to a manageable size. Performance numbers match or exceed other available implementations.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: