high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » A novel multiple-walk parallel algorithm for the Barnes-Hut treecode on GPUs – towards cost effective, high performance N-body simulation

A novel multiple-walk parallel algorithm for the Barnes-Hut treecode on GPUs – towards cost effective, high performance N-body simulation

Tsuyoshi Hamada, Keigo Nitadori, Khaled Benkrid, Yousuke Ohno, Gentaro Morimoto, Tomonari Masada, Yuichiro Shibata, Kiyoshi Oguri, Makoto Taiji

Faculty of Engineering, Department of Computer and Information Sciences, Nagasaki University, 852-8521, Nagasaki, Japan

Computer Science – Research and Development, Volume 24, Numbers 1-2, 21-31, (2009)

DOI:10.1007/s00450-009-0089-1

@article{hamada2009novel,

title={A novel multiple-walk parallel algorithm for the Barnes–Hut treecode on GPUs–towards cost effective, high performance N-body simulation},

author={Hamada, T. and Nitadori, K. and Benkrid, K. and Ohno, Y. and Morimoto, G. and Masada, T. and Shibata, Y. and Oguri, K. and Taiji, M.},

journal={Computer Science-Research and Development},

volume={24},

number={1},

pages={21–31},

issn={1865-2034},

year={2009},

publisher={Springer}

}

Source

2342

views

Recently, general-purpose computation on graphics processing units (GPGPU) has become an increasingly popular field of study as graphics processing units (GPUs) continue to be proposed as high performance and relatively low cost implementation platforms for scientific computing applications. Among these applications figure astrophysical N-bodysimulations, which form one of the most challenging problems in computational science. However, in most reported studies, a simple O(N^2) algorithm was used for GPGPUs, and the resulting performances were not observed to be better than those of conventional CPUs that were based on more optimized O(N log N) algorithms such as the tree algorithm or the particle-particle particle-mesh algorithm. Because of the difficulty in getting efficient implementations of such algorithms on GPUs, a GPU cluster had no practical advantage over general-purpose PC clusters for N-bodysimulations. In this paper, we report a new method for efficient parallel implementation of the tree algorithm on GPUs. Our novel tree code allows the realization of an N-bodysimulation on a GPU cluster at a much higher performance than that on general PC clusters. We practically performed a cosmological simulation with 562 million particles on a GPU cluster using 128 NVIDIA GeForce 8800GTS GPUs at an overall cost of 168172$. We obtained a sustained performance of 20.1 Tflops, which when normalized against a general-purpose CPU implementation leads to a performance of 8.50 Tflops. The achieved cost/performance was hence a mere $19.8/Gflops which shows the high competitiveness of GPGPUs.

Tags: Algorithms, Computer science, CUDA, GPU cluster, N-body simulation, nVidia, nVidia GeForce 8800 GTS

November 20, 2010 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

A novel multiple-walk parallel algorithm for the Barnes-Hut treecode on GPUs – towards cost effective, high performance N-body simulation

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

A novel multiple-walk parallel algorithm for the Barnes-Hut treecode on GPUs – towards cost effective, high performance N-body simulation

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)