A Scalable Approach to Solving Dense Linear Algebra Problems on Hybrid CPU-GPU Systems
Indiana University-Purdue University Indianapolis
Indiana University-Purdue University Indianapolis, 2014
@article{song2014scalable,
title={A Scalable Approach to Solving Dense Linear Algebra Problems on Hybrid CPU-GPU Systems},
author={Song, Fengguang and Dongarra, Jack},
year={2014}
}
Aiming to fully exploit the computing power of all CPUs and all GPUs on hybrid CPU-GPU systems to solve dense linear algebra problems, we design a class of heterogeneous tile algorithms to maximize the degree of parallelism, to minimize the communication volume, as well as to accommodate the heterogeneity between CPUs and GPUs. The new heterogeneous tile algorithms are executed upon our decentralized dynamic scheduling runtime system, which schedules a task graph dynamically and transfers data between compute nodes automatically. The runtime system uses a new distributed task-assignment protocol to solve data dependencies between tasks without any coordination between processing units. By overlapping computation and communication through dynamic scheduling, we are able to attain scalable performance for the double-precision Cholesky factorization and QR factorization. Our approach demonstrates performance better than both vendor (e.g., Intel MKL) and open source libraries (e.g., StarPU, PLASMA) in the following four possible environments: heterogeneous clusters with GPUs, conventional clusters without GPUs, sharedsystems with multiple GPUs, and shared-memory multicore computers.
August 1, 2014 by hgpu