High-throughput bayesian computing machine with reconfigurable hardware

hgpu.org » Applications » Computer science » High-throughput bayesian computing machine with reconfigurable hardware

High-throughput bayesian computing machine with reconfigurable hardware

Mingjie Lin, Ilia Lebedev, John Wawrzynek

Department of Electrical Engineering and Computer Science, University of California at Berkeley, CA 94704

In Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays (2010), pp. 73-82.

DOI:10.1145/1723112.1723127

BibTeX

Download (PDF)

View

Source

1964

views

We use reconfigurable hardware to construct a high throughput Bayesian computing machine (BCM) capable of evaluating probabilistic networks with arbitrary DAG (directed acyclic graph) topology. Our BCM achieves high throughput by exploiting the FPGA’s distributed memories and abundant hardware structures (such as long carry-chains and registers), which enables us to 1) develop an innovative memory allocation scheme based on a maximal matching algorithm that completely avoids memory stalls, 2) optimize and deeply pipeline the logic design of each processing node, and 3) optimally schedule them. The BCM architecture we present not only can be applied to many important algorithms in artificial intelligence, signal processing, and digital communications, but also has high reusability, i.e., a new application needs not change a BCM’s hardware design, only new task graph processing and code compilation are necessary. Moreover, the throughput of a BCM scales almost linearly with the size of the FPGA on which it is implemented. A prototype of a Bayesian computing machine with 16 processing nodes was implemented with a Virtex-5 FPGA (XCV5LX155T-2) on a BEE3 (Berkeley Emulation Engine) platform. For a wide variety of sample Bayesian problems, comparing running the same network evaluation algorithm on a 2.4 GHz Core 2 Duo Intel processor and a GeForce 9400m using the CUDA software package, the BCM demonstrates 80x and 15x speedups respectively, with a peak throughput of 20.4 GFLOPS (Giga Floating-Point Operations per Second).

Tags: Bayesian, Computer science, CUDA, FPGA, nVidia, nVidia GeForce 9400 M

January 9, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org