DC IM CVJan 30, 2015

Montblanc: GPU accelerated Radio Interferometer Measurement Equations in support of Bayesian Inference for Radio Observations

Simon Perkins, Patrick Marais, Jonathan Zwart, Iniyan Natarajan, Cyril Tasse, Oleg Smirnov

arXiv:1501.07719v32.319 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses a domain-specific problem for radio astronomy researchers by providing a highly efficient tool for Bayesian sky model selection, though it is incremental as it builds on existing GPU acceleration techniques tailored to a specific inference method.

The paper tackles the computational bottleneck in Bayesian inference for radio observations by presenting Montblanc, a GPU-accelerated implementation of the Radio Interferometer Measurement Equation, which achieves speedups of up to 250 times faster than CPU-based methods and 7.7 to 12 times faster than existing GPU implementations.

We present Montblanc, a GPU implementation of the Radio interferometer measurement equation (RIME) in support of the Bayesian inference for radio observations (BIRO) technique. BIRO uses Bayesian inference to select sky models that best match the visibilities observed by a radio interferometer. To accomplish this, BIRO evaluates the RIME multiple times, varying sky model parameters to produce multiple model visibilities. Chi-squared values computed from the model and observed visibilities are used as likelihood values to drive the Bayesian sampling process and select the best sky model. As most of the elements of the RIME and chi-squared calculation are independent of one another, they are highly amenable to parallel computation. Additionally, Montblanc caters for iterative RIME evaluation to produce multiple chi-squared values. Modified model parameters are transferred to the GPU between each iteration. We implemented Montblanc as a Python package based upon NVIDIA's CUDA architecture. As such, it is easy to extend and implement different pipelines. At present, Montblanc supports point and Gaussian morphologies, but is designed for easy addition of new source profiles. Montblanc's RIME implementation is performant: On an NVIDIA K40, it is approximately 250 times faster than MeqTrees on a dual hexacore Intel E5-2620v2 CPU. Compared to the OSKAR simulator's GPU-implemented RIME components it is 7.7 and 12 times faster on the same K40 for single and double-precision floating point respectively. However, OSKAR's RIME implementation is more general than Montblanc's BIRO-tailored RIME. Theoretical analysis of Montblanc's dominant CUDA kernel suggests that it is memory bound. In practice, profiling shows that is balanced between compute and memory, as much of the data required by the problem is retained in L1 and L2 cache.

View on arXiv PDF Code

Similar