A massively parallel adaptive fast-multipole method on heterogeneous architectures
Georgia Institute of Technology, Atlanta, GA 30332
In SC ’09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (2009), pp. 1-12
@conference{lashuk2009massively,
title={A massively parallel adaptive fast-multipole method on heterogeneous architectures},
author={Lashuk, I. and Chandramowlishwaran, A. and Langston, H. and Nguyen, T.A. and Sampath, R. and Shringarpure, A. and Vuduc, R. and Ying, L. and Zorin, D. and Biros, G.},
booktitle={Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis},
pages={1–12},
year={2009},
organization={ACM}
}
We present new scalable algorithms and a new implementation of our kernel-independent fast multipole method (Ying et al. ACM/IEEE SC ’03), in which we employ both distributed memory parallelism (via MPI) and shared memory/streaming parallelism (via GPU acceleration) to rapidly evaluate two-body non-oscillatory potentials. On traditional CPU-only systems, our implementation scales well up to 30 billion unknowns on 65K cores (AMD/CRAY-based Kraken system at NSF/NICS) for highly non-uniform point distributions. On GPU-enabled systems, we achieve 30x speedup for problems of up to 256 million points on 256 GPUs (Lincoln at NSF/NCSA) over a comparable CPU-only based implementations.
November 27, 2010 by hgpu