high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Scalable Fast Multipole Methods on Distributed Heterogeneous Architectures

Scalable Fast Multipole Methods on Distributed Heterogeneous Architectures

Qi Hu, Nail A. Gumerov, Ramani Duraiswami

University of Maryland, College Park

2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’11 ), 2011

DOI:10.1145/2063384.2063432

@article{hu2011scalable,

title={Scalable Fast Multipole Methods on Distributed Heterogeneous Architectures},

author={Hu, Q. and Gumerov, N.A. and Duraiswami, R.},

year={2011}

}

Download (PDF)

View

Source

2294

views

We fundamentally reconsider implementation of the Fast Multipole Method (FMM) on a computing node with a heterogeneous CPU-GPU architecture with multicore CPU(s) and one or more GPU accelerators, as well as on an interconnected cluster of such nodes. The FMM is a divide-and-conquer algorithm that performs a fast N-body sum using a spatial decomposition and is often used in a time-stepping or iterative loop. Using the observation that the local summation and the analysis-based translation parts of the FMM are independent, we map these respectively to the GPUs and CPUs. Careful analysis of the FMM is performed to distribute work optimally between the multicore CPUs and the GPU accelerators. We first develop a single node version where the CPU part is parallelized using OpenMP and the GPU version via CUDA. New parallel algorithms for creating FMM data structures are presented together with load balancing strategies for the single node and distributed multiple-node versions. Our implementation can perform the N-body sum for 128M particles on 16 nodes in 4.23 seconds, a performance not achieved by others in the literature on such clusters.

Tags: Algorithms, Computer science, CUDA, Fast multipole method, Heterogeneous systems, N-body simulation, nVidia, nVidia GeForce GTX 480, Tesla C2050, Tesla S1070

December 29, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Scalable Fast Multipole Methods on Distributed Heterogeneous Architectures

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Scalable Fast Multipole Methods on Distributed Heterogeneous Architectures

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)