Automatic library generation for BLAS3 on GPUs

Huimin Cui, Lei Wang, Jingling Xue, Yang Yang, Xiaobing Feng
Institute of Computing Technology, Chinese Academy of Sciences, China
IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2011


   title={Automatic library generation for BLAS3 on GPUs},

   author={Cui, H. and Wang, L. and Xue, J. and Yang, Y. and Feng, X.},

   booktitle={Parallel & Distributed Processing Symposium (IPDPS), 2011 IEEE International},





Download Download (PDF)   View View   Source Source   



High-performance libraries, the performance-critical building blocks for high-level applications, will assume greater importance on modern processors as they become more complex and diverse. However, automatic library generators are still immature, forcing library developers to manually tune library to meet their performance objectives. We are developing a new script-controlled compilation framework to help domain experts reduce much of the tedious and error-prone nature of manual tuning, by enabling them to leverage their expertise and reuse past optimization experiences. We focus on demonstrating improved performance and productivity obtained through using our framework to tune BLAS3 routines on three GPU platforms: up to 5.4x speedups over the CUBLAS achieved on NVIDIA GeForce 9800, 2.8x on GTX285, and 3.4x on Fermi Tesla C2050. Our results highlight the potential benefits of exploiting domain expertise and the relations between different routines (in terms of their algorithms and data structures).
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: