Performance Portability of a GPU Enabled Factorization with the DAGuE Framework
Innovative Computing Laboratory, the University of Tennessee
IEEE International Conference on Cluster Computing (CLUSTER), 2011
@inproceedings{bosilca2011performance,
title={Performance Portability of a GPU Enabled Factorization with the DAGuE Framework},
author={Bosilca, G. and Bouteiller, A. and Herault, T. and Lemarinier, P. and Saengpatsa, N.O. and Tomov, S. and Dongarra, J.J.},
booktitle={Cluster Computing (CLUSTER), 2011 IEEE International Conference on},
pages={395–402},
year={2011},
organization={IEEE}
}
Performance portability is a major challenge faced today by developers on heterogeneous high performance computers, consisting of an interconnect, memory with nonuniform access, many-cores and accelerators like GPUs. Recent studies have successfully demonstrated that dense linear algebra operations can be efficiently handled by runtime systems using a DAG representation. In this work, we present the GPU subsystem of the DAGuE runtime, and assess, on the Cholesky factorization test case, the minimal efforts required by a programmer to enable GPU acceleration in the DAGuE framework. The performance achieved by this unchanged code, on a variety of heterogeneous and distributed many cores and GPU resources, demonstrates the desired performance portability.
November 3, 2011 by hgpu