7671

A fight for performance and accuracy of the matrix multiplication routines: CUBLAS on Nvidia Tesla versus MKL and ATLAS on Intel Nehalem

Philippe Estival, Luc Giraud
CERFACS, 42 avenue Gustave Coriolis, 31057 Toulouse Cedex, France
hal-00699377, 2012

@article{estival2012performance,

   title={A fight for performance and accuracy of the matrix multiplication routines: CUBLAS on Nvidia Tesla versus MKL and ATLAS on Intel Nehalem},

   author={Estival, P. and Giraud, L.},

   year={2012}

}

Download Download (PDF)   View View   Source Source   

1136

views

Scientific computation relies heavily on 64 bits arithmetic. The evolution of the Graphical Processing Units to the status of massively micro-parallel vector units and the improvement of their programmability make them stand as powerfull algebraic coprocessors for many classes of matrix calculus. But on these processors inheriting from architectures dedicated to video processing in the first place, the space for double precision is narrow yet. One building block of dense linear algebra, the GEneralized Matrix Multiply Routine has been considerably accelerated on the GPU. We figure in this paper more details regarding its speed, but first and foremost, accuracy.
Rating: 1.3. From 6 votes.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: