high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » CUDA » Mixed-Precision GPU-Multigrid Solvers with Strong Smoothers

Mixed-Precision GPU-Multigrid Solvers with Strong Smoothers

Dominik Goddeke, Robert Strzodka

Institut fur Angewandte Mathematik, TU Dortmund, German

Chapter 7 in: Jakub Kurzak, David A. Bader and Jack J. Dongarra (eds.): Scientific Computing with Multicore and Accelerators, CRC Press, Dec. 2010

BibTeX

Download (PDF)

View

Source

1836

views

In this chapter, we present efficient fine-grained parallelization techniques for robust multigrid solvers, in particular for numerically strong, inherently sequential smoothing operators. We apply them to sparse ill-conditioned linear systems of equations that arise from grid-based discretization techniques like finite differences, volumes and elements. Our exemplary results demonstrate both the numerical and runtime performance of these techniques, as well as significant speedups over conventional CPUs. We implement the parallelization techniques on graphics processors as representatives of throughput-oriented wide-SIMD many-core architectures: GPUs offer a tremendous amount of fine-grained parallelism compared to commodity CPU designs, with up to 30 “cores” and more than 30,000 threads in flight simultaneously on current devices [3]. Our implementation uses NVIDIA CUDA, but the techniques we present are generally applicable to many-core architectures, e. g., using OpenCL [9], an open industry standard targeting diverse multi- and many-core architectures. We refer to the CUDA documentation [11] for an in-depth explanation of the terminology: “memory coalescing” (block memory transfers), “warps” and “half-warps” (SIMD granularity for computation and memory access), shared memory (small on-chip scratchpad memory), thread blocks (groups of threads with on-chip data exchange and synchronization).

Tags: CUDA, Mathematics, Mixed precision, nVidia, nVidia GeForce GTX 280, Partial differential equations, PDEs

March 2, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Mixed-Precision GPU-Multigrid Solvers with Strong Smoothers

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

Mixed-Precision GPU-Multigrid Solvers with Strong Smoothers

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)