11082

Lessons learned from contrasting a BLAS kernel implementations

Andres More
Intel Software Argentina (Argentina Software Design Center)
XIII Workshop procesamiento distribuido y paralelo (WPDP), 2013

@inproceedings{more2013lessons,

   title={Lessons learned from contrasting a BLAS kernel implementations},

   author={More, Andres},

   booktitle={XVIII Congreso Argentino de Ciencias de la Computaci{‘o}n},

   year={2013}

}

Download Download (PDF)   View View   Source Source   

1770

views

This work reviews the experience of implementing different versions of the SSPR rank-one update operation of the BLAS library. The main objective was to contrast CPU versus GPU implementation effort and complexity of an optimized BLAS routine, not considering performance. This work contributes with a sample procedure to compare BLAS kernel implementations, how to start using GPU libraries and offloading, how to analyze their performance and the issues faced and how they were solved.
Rating: 2.3/5. From 3 votes.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: