high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A massively parallel adaptive fast-multipole method on heterogeneous architectures

A massively parallel adaptive fast-multipole method on heterogeneous architectures

Ilya Lashuk, Aparna Chandramowlishwaran, Harper Langston, Tuan A. Nguyen, Rahul Sampath, Aashay Shringarpure, Richard Vuduc, Lexing Ying, Denis Zorin, George Biros

Georgia Institute of Technology, Atlanta, GA 30332

In SC ’09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (2009), pp. 1-12

DOI:10.1145/1654059.1654118

@conference{lashuk2009massively,

title={A massively parallel adaptive fast-multipole method on heterogeneous architectures},

author={Lashuk, I. and Chandramowlishwaran, A. and Langston, H. and Nguyen, T.A. and Sampath, R. and Shringarpure, A. and Vuduc, R. and Ying, L. and Zorin, D. and Biros, G.},

booktitle={Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis},

pages={1–12},

year={2009},

organization={ACM}

}

Download (PDF)

View

Source

1931

views

We present new scalable algorithms and a new implementation of our kernel-independent fast multipole method (Ying et al. ACM/IEEE SC ’03), in which we employ both distributed memory parallelism (via MPI) and shared memory/streaming parallelism (via GPU acceleration) to rapidly evaluate two-body non-oscillatory potentials. On traditional CPU-only systems, our implementation scales well up to 30 billion unknowns on 65K cores (AMD/CRAY-based Kraken system at NSF/NICS) for highly non-uniform point distributions. On GPU-enabled systems, we achieve 30x speedup for problems of up to 256 million points on 256 GPUs (Lincoln at NSF/NCSA) over a comparable CPU-only based implementations.

Tags: Computer science, CUDA, Fast multipole method, GPU cluster, MPI, N-body simulation, nVidia, Programming techniques, Tesla S1070

November 27, 2010 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org

A massively parallel adaptive fast-multipole method on heterogeneous architectures

Your response

Recent source codes

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

VibeCodeHPC - Multi Agentic Vibe Coding for HPC

Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework

exa-AMD: Exascale Accelerated Materials Discovery

TRUST: a thermalhydraulic software package for CFD simulations

Modular: The Modular Platform (includes MAX & Mojo)

Allo: Accelerator Design Language

Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization

Most viewed papers (last 30 days)

A massively parallel adaptive fast-multipole method on heterogeneous architectures

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)