A Massively Parallel Adaptive Fast Multipole Method on Heterogeneous Architectures

Ilya Lashuk, Aparna Chandramowlishwaran, Harper Langston, Tuan-Anh nguyen, Rahul Sampath, Aashay Shringarpure, Richard Vuduc, Lexing Ying, Denis Zorin, George Biros
Lawrence Livermore National Laboratory, Livermore, CA
Communications of the ACM 55, 2012


   author={Lashuk, Ilya and Chandramowlishwaran, Aparna and Langston, Harper and Nguyen, Tuan-Anh and Sampath, Rahul and Shringarpure, Aashay and Vuduc, Richard and Ying, Lexing and Zorin, Denis and Biros, George},

   title={A massively parallel adaptive fast multipole method on heterogeneous architectures},

   journal={Commun. ACM},

   issue_date={May 2012},












   address={New York, NY, USA}


Download Download (PDF)   View View   Source Source   



We describe a parallel fast multipole method (FMM) for highly nonuniform distributions of particles. We employ both distributed memory parallelism (via MPI) and shared memory parallelism (via OpenMP and GPU acceleration) to rapidly evaluate two-body nonoscillatory potentials in three dimensions on heterogeneous high performance computing architectures. We have performed scalability tests with up to 30 billion particles on 196,608 cores on the AMD/CRAY-based Jaguar system at ORNL. On a GPU-enabled system (NSF’s Keeneland at Georgia Tech/ORNL), we observed 30x speedup over a single core CPU and 7x speedup over a multicore CPU implementation. By combining GPUs with MPI, we achieve less than 10ns/particle and six digits of accuracy for a run with 48 million nonuniformly distributed particles on 192 GPUs.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: