1028

High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster

Dimitri Komatitsch, Gordon Erlebacher, Dominik Goddeke, David Michea
Universite de Pau et des Pays de l’Adour, CNRS & INRIA Magique-3D, Laboratoire de Modelisation et d’Imagerie en Geosciences UMR 5212, Avenue de l’Universite, 64013 Pau Cedex, France
Journal of Computational Physics, Volume 229, Issue 20, 1 October 2010, Pages 7692-7714

@article{komatitsch2010high,

   title={High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster},

   author={Komatitsch, D. and Erlebacher, G. and G{\”o}ddeke, D. and Mich{‘e}a, D.},

   journal={Journal of Computational Physics},

   year={2010},

   publisher={Elsevier}

}

Download Download (PDF)   View View   Source Source   

2747

views

We implement a high-order finite-element application, which performs the numerical simulation of seismic wave propagation resulting for instance from earthquakes at the scale of a continent or from active seismic acquisition experiments in the oil industry, on a large cluster of NVIDIA Tesla graphics cards using the CUDA programming environment and non-blocking message passing based on MPI. Contrary to many finite-element implementations, ours is implemented successfully in single precision, maximizing the performance of current generation GPUs. We discuss the implementation and optimization of the code and compare it to an existing very optimized implementation in C language and MPI on a classical cluster of CPU nodes. We use mesh coloring to efficiently handle summation operations over degrees of freedom on an unstructured mesh, and non-blocking MPI messages in order to overlap the communications across the network and the data transfer to and from the device via PCIe with calculations on the GPU. We perform a number of numerical tests to validate the single-precision CUDA and MPI implementation and assess its accuracy. We then analyze performance measurements and depending on how the problem is mapped to the reference CPU cluster, we obtain a speedup of 20x or 12x.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: