Montblanc: GPU accelerated Radio Interferometer Measurement Equations in support of Bayesian Inference for Radio Observations
Department of Computer Science, University of Cape Town, Rondebosch, South Africa, 7700
arXiv:1501.07719 [cs.DC], (30 Jan 2015)
@article{perkins2015montblanc,
title={Montblanc: GPU accelerated Radio Interferometer Measurement Equations in support of Bayesian Inference for Radio Observations},
author={Perkins, Simon and Maraism, Patrick and Zwart, Jonathan and Natarajan, Iniyan and Smirnov, Oleg},
year={2015},
month={jan},
archivePrefix={"arXiv"},
primaryClass={cs.DC}
}
We present Montblanc, a GPU implementation of the Radio interferometer measurement equation (RIME) in support of the Bayesian inference for radio observations (BIRO) technique. BIRO uses Bayesian inference to select sky models that best match the visibilities observed by a radio interferometer. To accomplish this, BIRO evaluates the RIME multiple times, varying sky model parameters to produce multiple model visibilities. Chi-squared values computed from the model and observed visibilities are used as likelihood values to drive the Bayesian sampling process and select the best sky model. As most of the elements of the RIME and chi-squared calculation are independent of one another, they are highly amenable to parallel computation. Additionally, Montblanc caters for iterative RIME evaluation to produce multiple chi-squared values. Only modified model parameters are transferred to the GPU between each iteration. We implemented Montblanc as a Python package based upon NVIDIA’s CUDA architecture. As such, it is easy to extend and implement different pipelines. At present, Montblanc supports point and Gaussian morphologies, but is designed for easy addition of new source profiles. Montblanc’s RIME implementation is performant: On an NVIDIA K40, it is approximately 250 times faster than MeqTrees on a dual hexacore Intel E5-2620v2 CPU. Compared to the OSKAR simulator’s GPU-implemented RIME components it is 7.7 and 12 times faster on the same K40 for single and double-precision oating point respectively. However, OSKAR’s RIME implementation is more general than Montblanc’s BIRO-tailored RIME. Theoretical analysis of Montblanc’s dominant CUDA kernel suggests that it is memory bound. In practice, profiling shows that is balanced between compute and memory, as much of the data required by the problem is retained in L1 and L2 cache.
February 2, 2015 by hgpu