Efficient simulation of agent-based models on multi-GPU and multi-core clusters

hgpu.org » Applications » Computer science » Efficient simulation of agent-based models on multi-GPU and multi-core clusters

Efficient simulation of agent-based models on multi-GPU and multi-core clusters

Brandon G. Aaby, Kalyan S. Perumalla, Sudip K. Seal

Oak Ridge National Laboratory, Oak Ridge, Tennessee

SIMUTools ’10 Proceedings of the 3rd International ICST Conference on Simulation Tools and Techniques

DOI:10.4108/ICST.SIMUTOOLS2010.8822

@conference{aaby2010efficient,

title={Efficient simulation of agent-based models on multi-GPU and multi-core clusters},

author={Aaby, B.G. and Perumalla, K.S. and Seal, S.K.},

booktitle={Proceedings of the 3rd International ICST Conference on Simulation Tools and Techniques},

pages={1–10},

year={2010},

organization={ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering)}

}

Download (PDF)

View

Source

2140

views

An effective latency-hiding mechanism is presented in the parallelization of agent-based model simulations (ABMS) with millions of agents. The mechanism is designed to accommodate the hierarchical organization as well as heterogeneity of current state-of-the-art parallel computing platforms. We use it to explore the computation vs. communication trade-off continuum available with the deep computational and memory hierarchies of extant platforms and present a novel analytical model of the tradeoff. We describe our implementation and report preliminary performance results on two distinct parallel platforms suitable for ABMS: CUDA threads on multiple, networked graphical processing units (GPUs), and pthreads on multi-core processors. Message Passing Interface (MPI) is used for inter-GPU as well as inter-socket communication on a cluster of multiple GPUs and multi-core processors. Results indicate the benefits of our latency-hiding scheme, delivering as much as over 100-fold improvement in runtime for certain benchmark ABMS application scenarios with several million agents. This speed improvement is obtained on our system that is already two to three orders of magnitude faster on one GPU than an equivalent CPU-based execution in a popular simulator in Java. Thus, the overall execution of our current work is over four orders of magnitude faster when executed on multiple GPUs.

Tags: Computer science, CUDA, nVidia, nVidia GeForce 8800 GTX, OpenMPI, Performance

November 22, 2010 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org