1514

Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures

Aparna Chandramowlishwaran, Samuel Williams, Leonid Oliker, Ilya Lashuk, George Biros, Richard Vuduc
CRD, Lawrence Berkeley National Laboratory, Berkeley, CA 94720
Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium (April 2010), pp. 1-12.

@conference{chandramowlishwaran2010optimizing,

   title={Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures},

   author={Chandramowlishwaran, A. and Williams, S. and Oliker, L. and Lashuk, I. and Biros, G. and Vuduc, R.},

   booktitle={Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on},

   pages={1–12},

   issn={1530-2075},

   year={2010},

   organization={IEEE}

}

Download Download (PDF)   View View   Source Source   

692

views

This work presents the first extensive study of single-node performance optimization, tuning, and analysis of the fast multipole method (FMM) on modern multi-core systems. We consider single- and double-precision with numerous performance enhancements, including low-level tuning, numerical approximation, data structure transformations, OpenMP parallelization, and algorithmic tuning. Among our numerous findings, we show that optimization and parallelization can improve double-precision performance by 25x- on Intel’s quad-core Nehalem, 9.4x- on AMD’s quad-core Barcelona, and 37.6x- on Sun’s Victoria Falls (dual-sockets on all systems). We also compare our single-precision version against our prior state-of-the-art GPU-based code and show, surprisingly, that the most advanced multicore architecture (Nehalem) reaches parity in both performance and power efficiency with NVIDIA’s most advanced GPU architecture.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: