Mixed-Precision GPU-Multigrid Solvers with Strong Smoothers

hgpu.org » Programming » CUDA » Mixed-Precision GPU-Multigrid Solvers with Strong Smoothers

Mixed-Precision GPU-Multigrid Solvers with Strong Smoothers

Dominik Goddeke, Robert Strzodka

Institut fur Angewandte Mathematik, TU Dortmund, German

Chapter 7 in: Jakub Kurzak, David A. Bader and Jack J. Dongarra (eds.): Scientific Computing with Multicore and Accelerators, CRC Press, Dec. 2010

BibTeX

Download (PDF)

View

Source

1835

views

In this chapter, we present efficient fine-grained parallelization techniques for robust multigrid solvers, in particular for numerically strong, inherently sequential smoothing operators. We apply them to sparse ill-conditioned linear systems of equations that arise from grid-based discretization techniques like finite differences, volumes and elements. Our exemplary results demonstrate both the numerical and runtime performance of these techniques, as well as significant speedups over conventional CPUs. We implement the parallelization techniques on graphics processors as representatives of throughput-oriented wide-SIMD many-core architectures: GPUs offer a tremendous amount of fine-grained parallelism compared to commodity CPU designs, with up to 30 “cores” and more than 30,000 threads in flight simultaneously on current devices [3]. Our implementation uses NVIDIA CUDA, but the techniques we present are generally applicable to many-core architectures, e. g., using OpenCL [9], an open industry standard targeting diverse multi- and many-core architectures. We refer to the CUDA documentation [11] for an in-depth explanation of the terminology: “memory coalescing” (block memory transfers), “warps” and “half-warps” (SIMD granularity for computation and memory access), shared memory (small on-chip scratchpad memory), thread blocks (groups of threads with on-chip data exchange and synchronization).

Tags: CUDA, Mathematics, Mixed precision, nVidia, nVidia GeForce GTX 280, Partial differential equations, PDEs

March 2, 2011 by hgpu

No votes yet.

Please wait...

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org