Automatically Tuned Dense Linear Algebra for Multicore+GPU
Department of Electrical Engineering and Computer Science, The University of Tennessee, Knoxville
Symposium on Application Accelerators in High Performance Computing, 2010
@article{fu2010automatically,
title={Automatically Tuned Dense Linear Algebra for Multicore+GPU},
author={Fu, Xing and Li, Xue and Peterson, Gregory D.},
booktitle={Application Accelerators in High Performance Computing, 2010 Symposium, Papers},
year={2010}
}
The Multicore+GPU architecture has been adopted in some of the fastest supercomputers listed on the TOP500. The MAGMA project aims to develop a dense linear algebra library similar to LAPACK but for heterogeneous/hybrid architectures processors like Multicore+GPU. However, to provide portable performance, manual parameter tuning is required. This paper presents automatically tuned LU factorization. The key parameter of LU factorization is tuned automatically to optimize performance for a particular GPU platform. Moreover, we propose a work stealing scheme and GREEN-synchronization to decrease the power consumption of the LU factorization and accelerate the entire application.
February 17, 2011 by hgpu