Locality optimization on a NUMA architecture for hybrid LU factorization
Inria and Universite Paris-Sud, France
hal-00957673, (10 March 2014)
@techreport{remy:hal-00957673,
hal_id={hal-00957673},
url={http://hal.inria.fr/hal-00957673},
title={Locality optimization on a NUMA architecture for hybrid LU factorization},
author={R{‘e}my, Adrien and Baboulin, Marc and Sosonkina, Masha and Rozoy, Brigitte},
keywords={ccNUMA; thread placement; dense linear systems; LU factorization; MAGMA library},
language={Anglais},
affiliation={Laboratoire de Recherche en Informatique – LRI , POSTALE – INRIA Saclay – Ile de France, Old Dominion University – ODU},
type={Rapport de recherche},
institution={INRIA},
number={RR-8497},
year={2014},
month={Mar},
pdf={http://hal.inria.fr/hal-00957673/PDF/RR-8497.pdf}
}
We study the impact of non-uniform memory accesses (NUMA) on the solution of dense general linear systems using an LU factorization algorithm. In particular we illustrate how an appropriate placement of the threads and memory on a NUMA architecture can improve the performance of the panel factorization and consequently accelerate the global LU factorization. We apply these placement strategies and present performance results for a hybrid multicore/GPU LU algorithm as it is implemented in the public domain library MAGMA.
March 12, 2014 by hgpu