high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Parallel Exact Inference on a CPU-GPGPU Heterogenous System

Parallel Exact Inference on a CPU-GPGPU Heterogenous System

Hyeran Jeon, Yinglong Xia, Viktor K. Prasanna

Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, CA 90089, USA

39th International Conference on Parallel Processing (ICPP), 2010

DOI:10.1109/ICPP.2010.15

BibTeX

Download (PDF)

View

Source

1721

views

Exact inference is a key problem in exploring probabilistic graphical models. The computational complexity of inference increases dramatically with the parameters of the graphical model. To achieve scalability over hundreds of threads remains a fundamental challenge. In this paper, we use a lightweight scheduler hosted by the CPU to allocate cliques in junction trees to the GPGPU at run time. The scheduler merges multiple small cliques or splits large cliques dynamically so as to maximize the utilization of the GPGPU resources. We implement node level primitives on the GPGPU to process the cliques assigned by the CPU. We propose a conflict free potential table organization and an efficient data layout for coalescing memory accesses. In addition, we develop a double buffering based asynchronous data transfer between CPU and GPGPU to overlap clique processing on the GPGPU with data transfer and scheduling activities. Our implementation achieved 30x speedup compared with state-of-the-art multicore processors.

Tags: Computer science, CUDA, Heterogeneous systems, nVidia, nVidia GeForce GTX 260

April 4, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Parallel Exact Inference on a CPU-GPGPU Heterogenous System

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

Parallel Exact Inference on a CPU-GPGPU Heterogenous System

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)