Mixed-Precision GPU-Multigrid Solvers with Strong Smoothers
Institut fur Angewandte Mathematik, TU Dortmund, German
Chapter 7 in: Jakub Kurzak, David A. Bader and Jack J. Dongarra (eds.): Scientific Computing with Multicore and Accelerators, CRC Press, Dec. 2010
@incollection{Goeddeke:2010:MPG,
author={Dominik G{“o}ddeke and Robert Strzodka},
title={Mixed Precision {GPU}-Multigrid Solvers with Strong Smoothers},
booktitle={Scientific Computing with Multicore and Accelerators},
publisher={CRC Press},
chapter={7},
year={2010},
month={dec},
editor={Jakub Kurzak and David A. Bader and Jack J. Dongarra}
}
In this chapter, we present efficient fine-grained parallelization techniques for robust multigrid solvers, in particular for numerically strong, inherently sequential smoothing operators. We apply them to sparse ill-conditioned linear systems of equations that arise from grid-based discretization techniques like finite differences, volumes and elements. Our exemplary results demonstrate both the numerical and runtime performance of these techniques, as well as significant speedups over conventional CPUs. We implement the parallelization techniques on graphics processors as representatives of throughput-oriented wide-SIMD many-core architectures: GPUs offer a tremendous amount of fine-grained parallelism compared to commodity CPU designs, with up to 30 “cores” and more than 30,000 threads in flight simultaneously on current devices [3]. Our implementation uses NVIDIA CUDA, but the techniques we present are generally applicable to many-core architectures, e. g., using OpenCL [9], an open industry standard targeting diverse multi- and many-core architectures. We refer to the CUDA documentation [11] for an in-depth explanation of the terminology: “memory coalescing” (block memory transfers), “warps” and “half-warps” (SIMD granularity for computation and memory access), shared memory (small on-chip scratchpad memory), thread blocks (groups of threads with on-chip data exchange and synchronization).
March 2, 2011 by hgpu