An efficient mixed-precision, hybrid CPU-GPU implementation of a fully implicit particle-in-cell algorithm

hgpu.org » Programming » Algorithms » An efficient mixed-precision, hybrid CPU-GPU implementation of a fully implicit particle-in-cell algorithm

An efficient mixed-precision, hybrid CPU-GPU implementation of a fully implicit particle-in-cell algorithm

Guangye Chen, Luis Chacon, Daniel C. Barnes

Oak Ridge National Laboratory, Oak Ridge,TN 37831, USA

arXiv:1111.5295v1 [physics.plasm-ph] (22 Nov 2011)

@article{2011arXiv1111.5295C,

author={Chen}, G. and {Chac{‘o}n}, L. and {Barnes}, D.~C.},

title={"{An efficient mixed-precision, hybrid CPU-GPU implementation of a fully implicit particle-in-cell algorithm}"},

journal={ArXiv e-prints},

archivePrefix={"arXiv"},

eprint={1111.5295},

primaryClass={"physics.plasm-ph"},

keywords={Physics – Plasma Physics, Physics – Computational Physics},

year={2011},

month={nov},

adsurl={http://adsabs.harvard.edu/abs/2011arXiv1111.5295C},

adsnote={Provided by the SAO/NASA Astrophysics Data System}

}

Download (PDF)

View

Source

1531

views

Recently, a fully implicit, energy- and charge-conserving particle-in-cell method has been proposed for multi-scale, full-f kinetic simulations [G. Chen, et al., J. Comput. Phys. 230,18 (2011)]. The method employs a Jacobian-free Newton-Krylov (JFNK) solver, capable of using very large timesteps without loss of numerical stability or accuracy. A fundamental feature of the method is the segregation of particle-orbit computations from the field solver, while remaining fully self-consistent. This paper describes a very efficient, mixed-precision hybrid CPU-GPU implementation of the implicit PIC algorithm exploiting this feature. The JFNK solver is kept on the CPU in double precision (DP), while the implicit, charge-conserving, and adaptive particle mover is implemented on a GPU (graphics processing unit) using CUDA in single-precision (SP). Performance-oriented optimizations are introduced with the aid of the roofline model. The implicit particle mover algorithm is shown to achieve up to 400 GOp/s on a Nvidia GeForce GTX580. This corresponds to 25% absolute GPU efficiency against the peak theoretical performance, and is about 300 times faster than an equivalent serial CPU (Intel Xeon X5460) execution. For the test case chosen, the mixed-precision hybrid CPU-GPU solver is shown to over-perform the DP CPU-only serial version by a factor of sim 100, without apparent loss of robustness or accuracy in a challenging long-timescale ion acoustic wave simulation.

Tags: Algorithms, Computational Physics, CUDA, nVidia, nVidia GeForce GTX 580, Optimization, Particle-in-cell methods, Physics, Plasma physics

November 23, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org