high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Physics » Astrophysics » Implementation of a Parallel Tree Method on a GPU

Implementation of a Parallel Tree Method on a GPU

Naohito Nakasato

Department of Computer Science and Engineering, University of Aizu, Aizu-Wakamatsu, Fukushima 965-8580, Japan

arXiv:1112.4539v1 [astro-ph.IM] (20 Dec 2011)

@article{2011arXiv1112.4539N,

author={Nakasato}, N.},

title={"{Implementation of a Parallel Tree Method on a GPU}"},

journal={ArXiv e-prints},

archivePrefix={"arXiv"},

eprint={1112.4539},

primaryClass={"astro-ph.IM"},

keywords={Astrophysics – Instrumentation and Methods for Astrophysics, Astrophysics – Galaxy Astrophysics, Computer Science – Performance},

year={2011},

month={dec},

adsurl={http://adsabs.harvard.edu/abs/2011arXiv1112.4539N},

adsnote={Provided by the SAO/NASA Astrophysics Data System}

}

Download (PDF)

View

Source

2580

views

The kd-tree is a fundamental tool in computer science. Among other applications, the application of kd-tree search (by the tree method) to the fast evaluation of particle interactions and neighbor search is highly important, since the computational complexity of these problems is reduced from O(N^2) for a brute force method to O(N log N) for the tree method, where N is the number of particles. In this paper, we present a parallel implementation of the tree method running on a graphics processing unit (GPU). We present a detailed description of how we have implemented the tree method on a Cypress GPU. An optimization that we found important is localized particle ordering to effectively utilize cache memory. We present a number of test results and performance measurements. Our results show that the execution of the tree traversal in a force calculation on a GPU is practical and efficient.

Tags: Astrophysics, ATI, ATI IL, ATI Radeon HD 5870, Computational Complexity, Computer science, Galaxy Astrophysics, Instrumentation and Methods for Astrophysics, KD-tree, OpenCL, Optimization, Performance

December 21, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Implementation of a Parallel Tree Method on a GPU

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Implementation of a Parallel Tree Method on a GPU

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)