Dynamically scheduled Cholesky factorization on multicore architectures with GPU accelerators
INRIA, LaBRI, University of Bordeaux
Symposium on Application Accelerators in High Performance Computing, 2010
@article{agullo2010dynamically,
title={Dynamically scheduled Cholesky factorization on multicore architectures with GPU accelerators},
author={Agullo, E. and Augonnet, C. and Dongarra, J. and Ltaief, H. and Namyst, R. and Roman, J. and Thibault, S. and Tomov, S.},
booktitle={Application Accelerators in High Performance Computing, 2010 Symposium, Papers},
year={2010}
}
Although the hardware has dramatically changed in the last few years, nodes of multicore chips augmented by Graphics Processing Units (GPUs) seem to be a trend of major importance. Previous approaches for scheduling dense linear operations on such a complex node led to high performance but at the double cost of not using the potential of all the cores and producing a static and non generic code. In this extended abstract, we present a new approach for scheduling dense linear algebra operations on multicore architectures with GPU accelerators using a dynamic scheduler capable of using the full potential of the node [1]. We underline the benefits both in terms of programmability and performance. We illustrate our approach with a Cholesky factorization relying on cutting edge GPU and CPU kernels [2], [3] achieving roughly 900 Gflop/s on an eight cores node accelerated with three NVIDIA Tesla GPUs.
February 17, 2011 by hgpu