Scalable Fast Multipole Methods on Heterogeneous Architecture

hgpu.org » Programming » Algorithms » Scalable Fast Multipole Methods on Heterogeneous Architecture

Scalable Fast Multipole Methods on Heterogeneous Architecture

Q. Hu

University of Maryland, College Park

@article{hu2013scalable,

title={Scalable Fast Multipole Methods on Heterogeneous Architecture},

author={Hu, Qi},

year={2013}

}

Download (PDF)

View

Source

1697

views

The N-body problem appears in many computational physics simulations. At each time step the computation involves an all-pairs sum whose complexity is quadratic, followed by an update of particle positions. This cost means that it is not practical to solve such dynamic N-body problems on large scale. To improve this situation, we use both algorithmic and hardware approaches. Our algorithmic approach is to use the Fast Multipole Method (FMM), which is a divide-and-conquer algorithm that performs a fast N-body sum using a spatial decomposition and is often used in a time-stepping or iterative loop, to reduce such quadratic complexity to linear with guaranteed accuracy. Our hardware approach is to use heterogeneous clusters, which comprised of nodes that contain multi-core CPUs tightly coupled with accelerators, such as graphics processors unit (GPU) as our underline parallel processing hardware, on which efficient implementations require highly non-trivial re-designed algorithms. In this dissertation, we fundamentally reconsider the FMM algorithms on heterogeneous architectures to achieve a significant improvement over recent/previous implementations in literature and to make the algorithm ready for use as a workhorse simulation tool for both time-dependent vortex flow problems and for boundary element methods. Our major contributions include: 1. Novel FMM data structures using parallel construction algorithms for dynamic problems. 2. A fast hetegenenous FMM algorithm for both single and multiple computing nodes. 3. An efficient inter-node communication management using fast parallel data structures. 4. A scalable FMM algorithm using novel Helmholz decomposition for Vortex Methods (VM). The proposed algorithms can handle non-uniform distributions with irregular partition shapes to achieve workload balance and their MPI-CUDA implementations are highly tuned up and demonstrate the state of the art performances.

Tags: Algorithms, Computer science, CUDA, Fast multipole method, Heterogeneous systems, N-body simulation, nVidia, nVidia GeForce 8800 GTX, Tesla C1060, Tesla C2050, Thesis

October 9, 2013 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org