6527

Enhanced Parallel ILU (p)-based Preconditioners for Multi-core CPUs and GPUs-The Power (g)-pattern Method

Vincent Heuveline, Dimitar Lukarski, Jan-Philipp Weiss
Engineering Mathematics and Computing Lab (EMCL), Karlsruhe Institute of Technology, Germany
Preprint Series of the Engineering Mathematics and Computing Lab (EMCL), No. 2011-08, 2011

@misc{emcl-preprint-2011-08,

   author={Heuveline, Vincent and Lukarski, Dimitar and Weiss, Jan-Philipp},

   title={Enhanced Parallel ILU(p)-based Preconditioners for Multi-core CPUs and GPUs — The Power(q)-pattern Method},

   howpublished={EMCL Preprint Series},

   keywords={Parallel preconditioners, fine-grained parallelism, multi-coloring, ILU with fill-ins, power(q)-pattern method, multi-core CPUs, GPU},

   url={http://www.emcl.kit.edu/preprints/emcl-preprint-2011-08.pdf},

   year={2011},

   number={08},

   issn={2191–0693}

}

Download Download (PDF)   View View   Source Source   

584

views

Application demands and grand challenges in numerical simulation require for both highly capable computing platforms and efficient numerical solution schemes. Power constraints and further miniaturization of modern and future hardware give way for multi- and manycore processors with increasing fine-grained parallelism and deeply nested hierarchical memory systems — as already exemplified by recent graphics processing units. Accordingly, numerical schemes need to be adapted and re-engineered in order to deliver scalable solutions across diverse processor configurations. Portability of parallel software solutions across emerging hardware platforms is another challenge. This work investigates multi-coloring and re-ordering schemes for block Gauss-Seidel methods and, in particular, for incomplete LU factorizations with and without fill-ins. We consider two matrix re-ordering schemes that deliver flexible and efficient parallel preconditioners. The general idea is to generate block decompositions of the system matrix such that the diagonal blocks are diagonal itself. In such a way, parallelism can be exploited on the block-level in a scalable manner. Our goal is to provide widely applicable, out-of-the-box preconditioners that can be used in the context of finite element solvers. We propose a new method for anticipating the fill-in pattern of ILU($p$) schemes which we call the power($q$)-pattern method. This method is based on an incomplete factorization of the system matrix $A$ subject to a predetermined pattern given by the matrix power $|A|^(p+1)$ and its associated multi-coloring permutation. We prove that the obtained sparsity pattern is a superset of our modified ILU($p$) factorization applied to pi A pi^(-1). As a result, this modified ILU($p$) applied to multi-colored system matrix has no fill-ins in its diagonal blocks. This leads to an inherently parallel execution of triangular ILU($p$) sweeps. In addition, we describe the integration of the preconditioners into the HiFlow$^3$ open-source finite element package that provides a portable software solution across diverse hardware platforms. On this basis, we conduct performance analysis across a variety of test problems on multi-core CPUs and GPUs that proves efficiency, scalability and flexibility of our approach. Our preconditioners achieve a solver acceleration by a factor of up to 1.5, 8 and 85 for three different test problems. The GPU versions of the preconditioned solver are by a factor of up to 4 faster than an OpenMP parallel version on eight cores.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: