4218

Spherical harmonic transform on heterogeneous architectures using hybrid programming

Mikolaj Szydlarski, Pierre Esterie, Joel Falcou, Laura Grigori, Radek Stompor
INRIA Saclay-Ile de France, F-91893 Orsay, France
arXiv:1106.0159v1 [cs.DC] (1 Jun 2011)

@article{2011arXiv1106.0159S,

   author={Szydlarski}, M. and {Esterie}, P. and {Falcou}, J. and {Grigori}, L. and {Stompor}, R.},

   title={"{Spherical harmonic transform on heterogeneous architectures using hybrid programming}"},

   journal={ArXiv e-prints},

   archivePrefix={"arXiv"},

   eprint={1106.0159},

   primaryClass={"cs.DC"},

   keywords={Computer Science – Distributed, Parallel, and Cluster Computing},

   year={2011},

   month={jun},

   adsurl={http://adsabs.harvard.edu/abs/2011arXiv1106.0159S},

   adsnote={Provided by the SAO/NASA Astrophysics Data System}

}

Download Download (PDF)   View View   Source Source   

1483

views

Spherical Harmonic Transforms (SHT) are at the heart of many scientific and practical ap- plications ranging from climate modeling to cosmological observations. In many of these areas a new wave of exciting, cutting-edge science goals have been recently proposed calling for simulations and analyses of actual experimental or observational data at very high resolutions, accompanied by producing or processing unprecedented volumes of the data. Both these aspects pose formidable challenge for the currently existing implementations of the transforms. This paper describes a multi CPU-GPUs implementation of an inverse SHT, based on hybrid program- ming combining MPI and CUDA, and discusses its tests as motivated by these forthcoming applications. We present performance comparisons of the multi GPU version and a hybrid, MPI/OpenMP version of the same transform. We find that one NVIDIA Tesla S1070 can accelerate overall execution time of the SHT by as much as 3 times with respect to the MPI/OpenMP version executed on one quad-core processor (Intel Nehalem 2.93 GHz) and, owing to very good scalability of both versions, 128 Tesla cards perform as good as 256 twelve-core processor (AMD Opteron 2.1 GHz).
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: