Performance Portability of a GPU Enabled Factorization with the DAGuE Framework
Innovative Computing Laboratory, the University of Tennessee
IEEE International Conference on Cluster Computing (CLUSTER), 2011
Performance portability is a major challenge faced today by developers on heterogeneous high performance computers, consisting of an interconnect, memory with nonuniform access, many-cores and accelerators like GPUs. Recent studies have successfully demonstrated that dense linear algebra operations can be efficiently handled by runtime systems using a DAG representation. In this work, we present the GPU subsystem of the DAGuE runtime, and assess, on the Cholesky factorization test case, the minimal efforts required by a programmer to enable GPU acceleration in the DAGuE framework. The performance achieved by this unchanged code, on a variety of heterogeneous and distributed many cores and GPU resources, demonstrates the desired performance portability.
November 3, 2011 by hgpu