7405

Bound the Peak Performance of SGEMM on GPU with software-controlled fast memory

Junjie Lai, Andre Seznec
Project-Team ALF
hal-00686006, 2012

@techreport{LAI-2012-686006,

   hal_id={hal-00686006},

   url={http://hal.inria.fr/hal-00686006},

   title={Bound the Peak Performance of SGEMM on GPU with software-controlled fast memory},

   author={Lai, Junjie and Seznec, Andr{‘e}},

   abstract={In this paper, we studied the NVIDIA GPU architecture characteristics concerning the SGEMM routine and the potential peak performance of SGEMM on Fermi GPU. Guiding by the analysis, our SGEMM routine achieved about 11% (NN), 4.5% (TN), 3% (NT) and 9% (TT) better performance than cublas in CUDA 4.1 package for large matrices on GTX580 Fermi Card. We also described how to use native assembly language directly in the CUDA runtime source code.},

   language={Anglais},

   affiliation={ALF – INRIA – IRISA},

   type={Rapport de recherche},

   year={2012},

   month={Apr},

   pdf={http://hal.inria.fr/hal-00686006/PDF/techReport.pdf}

}

Download Download (PDF)   View View   Source Source   

783

views

In this paper, we studied the NVIDIA GPU architecture characteristics concerning the SGEMM routine and the potential peak performance of SGEMM on Fermi GPU. Guiding by the analysis, our SGEMM routine achieved about 11% (NN), 4.5% (TN), 3% (NT) and 9% (TT) better performance than cublas in CUDA 4.1 package for large matrices on GTX580 Fermi Card. We also described how to use native assembly language directly in the CUDA runtime source code.
No votes yet.
Please wait...

Recent source codes

* * *

* * *

HGPU group © 2010-2018 hgpu.org

All rights belong to the respective authors

Contact us: