Efficient Implementation of MrBayes on multi-GPU

Jie Bao, Hongju Xia, Jianfu Zhou, Xiaoguang Liu, Gang Wang
College of Information Technical Science, Nankai University, Tianjin, China
College of Information Technical Science, Nankai University, 2013


   title={PDF Proof: Mol. Biol. Evol.},

   author={Xia, H. and Zhou, J. and Wang, G.},



Download Download (PDF)   View View   Source Source   Source codes Source codes




MrBayes, using Metropolis coupled Markov chain Monte Carlo [MCMCMC, or (MC)^3 for short], is a popular program for Bayesian inference. As a leading method of using DNA data to infer phylogeny, now the (MC)^3 Bayesian algorithm and its improved and parallel versions are all not fast enough for Biologists to analyze massive real-world DNA data. Recently Graphics Processor Unit (GPU) has shown its power as a co-processor (or rather, an accelerator) in many fields. This paper describes an efficient implementation a(MC)^3 [aMCMCMC] for MrBayes (MC)^3 on Compute Unified Device Architecture (CUDA). By dynamically adjusting the task granularity to adapt to input data size and hardware configuration, it makes full use of GPU cores with different data sets. An adaptive method is also developed to split and combine DNA sequences to make full use of a large number of GPU cards. Furthermore, a new "node-by-node" task scheduling strategy is developed to improve concurrency, and several optimizing methods are used to reduce extra overhead. Experimental results show that a(MC)^3 achieves up to 55x speedup over serial MrBayes on a single machine with one GPU card, and up to 154x speedup with four GPU cards, and up to 439x speedup with a 32-node GPU cluster. a(MC)^3 is dramatically faster than all the previous (MC)^3 algorithms and scales well to large GPU clusters.
Rating: 2.5. From 3 votes.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: