high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Scalable Fast Multipole Methods on Distributed Heterogeneous Architectures

Scalable Fast Multipole Methods on Distributed Heterogeneous Architectures

Qi Hu, Nail A. Gumerov, Ramani Duraiswami

University of Maryland, College Park

2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’11 ), 2011

DOI:10.1145/2063384.2063432

@article{hu2011scalable,

title={Scalable Fast Multipole Methods on Distributed Heterogeneous Architectures},

author={Hu, Q. and Gumerov, N.A. and Duraiswami, R.},

year={2011}

}

Download (PDF)

View

Source

1563

views

We fundamentally reconsider implementation of the Fast Multipole Method (FMM) on a computing node with a heterogeneous CPU-GPU architecture with multicore CPU(s) and one or more GPU accelerators, as well as on an interconnected cluster of such nodes. The FMM is a divide-and-conquer algorithm that performs a fast N-body sum using a spatial decomposition and is often used in a time-stepping or iterative loop. Using the observation that the local summation and the analysis-based translation parts of the FMM are independent, we map these respectively to the GPUs and CPUs. Careful analysis of the FMM is performed to distribute work optimally between the multicore CPUs and the GPU accelerators. We first develop a single node version where the CPU part is parallelized using OpenMP and the GPU version via CUDA. New parallel algorithms for creating FMM data structures are presented together with load balancing strategies for the single node and distributed multiple-node versions. Our implementation can perform the N-body sum for 128M particles on 16 nodes in 4.23 seconds, a performance not achieved by others in the literature on such clusters.

Tags: Algorithms, Computer science, CUDA, Fast multipole method, Heterogeneous systems, N-body simulation, nVidia, nVidia GeForce GTX 480, Tesla C2050, Tesla S1070

December 29, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Scalable Fast Multipole Methods on Distributed Heterogeneous Architectures

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

Scalable Fast Multipole Methods on Distributed Heterogeneous Architectures

Share this:

Recent source codes

Most viewed papers (last 30 days)