high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Fully Parallel Particle Learning for GPGPUs and Other Parallel Devices

Fully Parallel Particle Learning for GPGPUs and Other Parallel Devices

Kenichiro McAlinn, Hiroaki Katsura, Teruo Nakatsuma

Graduate School of Economics, Keio University, 2-15-45 Mita, Minato-ku, Tokyo, Japan

arXiv:1212.1639 [stat.CO] (7 Dec 2012)

@article{2012arXiv1212.1639M,

author={McAlinn}, K. and {Katsura}, H. and {Nakatsuma}, T.},

title={"{Fully Parallel Particle Learning for GPGPUs and Other Parallel Devices}"},

journal={ArXiv e-prints},

archivePrefix={"arXiv"},

eprint={1212.1639},

primaryClass={"stat.CO"},

keywords={Statistics – Computation},

year={2012},

month={dec},

adsurl={http://adsabs.harvard.edu/abs/2012arXiv1212.1639M},

adsnote={Provided by the SAO/NASA Astrophysics Data System}

}

Download (PDF)

View

Source

2276

views

We developed a novel parallel algorithm for particle filtering (and learning) which is specifically designed for GPUs (graphics processing units) or similar parallel computing devices. In our new algorithm, a full cycle of particle filtering (computing the value of the likelihood for each particle, constructing the cumulative distribution function (CDF) for resampling, resampling the particles with the CDF, and propagating new particles for the next cycle) can be executed in a massively parallel manner. One of the advantages of our algorithm is that every single numerical computation or memory access related to the particle filtering is executed solely inside the GPU, and no data transfer between the GPU’s device memory and the CPU’s host memory occurs unless it is under the absolute necessity of moving generated particles into the host memory for further data processing, so that it can circumvent the limited memory bandwidth between the GPU and the CPU. To demonstrate the advantage of our parallel algorithm, we conducted a Monte Carlo experiment in which we applied the parallel algorithm as well as conventional sequential algorithms for estimation of a simple state space model via particle learning, and compared them in terms of execution time. The results showed that the parallel algorithm was far superior to the sequential algorithm.

Tags: Algorithms, Bayesian, CUDA, nVidia, nVidia GeForce GTX 580, Particle filtering, Statistics

December 10, 2012 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Fully Parallel Particle Learning for GPGPUs and Other Parallel Devices

Your response

Recent source codes

Kernel Library for LLM Serving

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Genten: Software for Generalized Tensor Decompositions by Sandia National Laboratories

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

VibeCodeHPC - Multi Agentic Vibe Coding for HPC

Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework

exa-AMD: Exascale Accelerated Materials Discovery

Most viewed papers (last 30 days)

Fully Parallel Particle Learning for GPGPUs and Other Parallel Devices

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)