## Acceleration Methods for Bayesian Network Sampling

Department of Computer Science, College of Arts and Sciences, The Florida State University

The Florida State University, 2011

@phdthesis{yu2011acceleration,

title={Acceleration Methods for Bayesian Network Sampling},

author={Yu, H.},

year={2011}

}

Bayesian inference with Bayesian networks is a #P-complete problem in general. Exact Bayesian inference is feasible in practice only on small-scale Bayesian networks or networks that are dramatically simplified, such as with naive Bayes or other approximations. Stochastic sampling methods, in particular importance sampling, form one of the most prominent and efficient approximate inference techniques that gives answers in reasonable computing time. A critical problem for importance sampling is to choose an importance function to efficiently and accurately sample the posterior probability distribution given a set of observed variables (also referred to as evidence). Choosing an importance function, or designing a new function, faces two major hurdles: how to approach the posterior probability distribution accurately and how to reduce the generation of undesirable inconsistent samples caused by deterministic relations in the network (also known as the rejection problem). In my dissertation I propose Refractor Importance Sampling, consisting of a family of importance sampling algorithms to effectively break the lower bound on the error of the importance function to ensure it can approach the posterior probability distribution of a Bayesian network given evidence. The aim is to increase the convergence rate of the approximation and to reduce sampling variance. I also propose Posterior Subgraph Clustering to improve the sampling process by breaking a network into several small independent subgraphs. To address the rejection problem, I propose Zero Probability Backfilling and Compressed Vertex Tree methods to detect and store deterministic constrains that are the root cause of inconsistent samples. The rejection problem is NP-hard and I prove in my dissertation that even limiting the inconsistent samples to a certain degree with positive probability is too hard to be polynomial. I propose the k-test to measure the hardness of the rejection problem by scaling the clause density of a CNF constructed from a Bayesian network. In the final part of my dissertation, I design and implement importance sampling algorithms for GPU platforms to speed up sampling by exploiting fine-grain parallelism. GPUs are cost-effective computing devices. However, the randomness of memory-intensive operations by Bayesian network sampling causes difficulties to obtain computational speedups with these devices. In my dissertation I propose a new method, Parallel Irregular Wavefront Sampling, to improve memory access on GPU by leveraging the conditional independence relations between variables in a network. I improve this scheme further by using the posterior distribution as an oracle to reduce costly memory fetches in the GPU. The proposed methods are implemented and experimentally tested. The results show a significant contribution in efficiency and accuracy of Bayesian network sampling for Bayesian inference with real-world and synthetic benchmark Bayesian networks.

November 2, 2011 by hgpu