A GPU cluster optimized multigrid scheme for computing unsteady incompressible fluid flow

hgpu.org » Applications » Fluid dynamics » A GPU cluster optimized multigrid scheme for computing unsteady incompressible fluid flow

A GPU cluster optimized multigrid scheme for computing unsteady incompressible fluid flow

Gyorgy Tegze, Gyula I. Toth

Institute for Solid State Physics and Optics, Wigner Research Centre for Physics P.O. Box 49, H-1525 Budapest, Hungary

arXiv:1309.7128 [math.NA], (27 Sep 2013)

@article{2013arXiv1309.7128T,

author={Tegze}, G. and {T{‘o}th}, G.~I.},

title={"{A GPU cluster optimized multigrid scheme for computing unsteady incompressible fluid flow}"},

journal={ArXiv e-prints},

archivePrefix={"arXiv"},

eprint={1309.7128},

primaryClass={"math.NA"},

keywords={Mathematics – Numerical Analysis, Physics – Computational Physics},

year={2013},

month={sep},

adsurl={http://adsabs.harvard.edu/abs/2013arXiv1309.7128T},

adsnote={Provided by the SAO/NASA Astrophysics Data System}

}

Download (PDF)

View

Source

1610

views

A multigrid scheme has been proposed that allows efficient implementation on modern CPUs, many integrated core devices (MICs), and graphics processing units (GPUs). It is shown that wide single instruction multiple data (SIMD) processing engines are used efficiently when a deep, 2h grid hierarchy is replaced with a two level scheme using 16h-32h restriction. The restriction length can be fitted to the SIMD width to fully utilize the capabilities of modern CPUs and GPUs. This way, optimal memory transfer is also ensured, since no strided memory access is required. The number of the expensive restriction steps is greatly reduced, and these are executed on bigger chunks of data that allows optimal caching strategies. A higher order interpolated stencil was developed to improve convergence rate via minimizing spurious interference between the coarse and the fine scale solutions. The method is demonstrated on solving the pressure equation for 2D incompressible fluid flow: The benchmark setups cover shear driven laminar flow in cavity, and direct numerical simulation (DNS) of a turbulent jet. We show that the scheme also allows efficient usage of distributed memory computer clusters via decreasing the number of memory transfers between host and compute devices, and among cluster nodes. The actual implementation uses a hybrid OpenCl/MPI based parallelization.

Tags: ATI, ATI Radeon HD 7970, Fluid dynamics, GPU cluster, Mathematics, MPI, Numerical Analysis, Numerical simulation, nVidia, nVidia GeForce GTX 680, OpenCL

September 30, 2013 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org