MEAILGCOJun 11, 2021

DG-LMC: A Turn-key and Scalable Synchronous Distributed MCMC Algorithm via Langevin Monte Carlo within Gibbs

arXiv:2106.06300v218 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of inefficient and unreliable distributed Bayesian inference for researchers and practitioners handling large datasets, representing an incremental improvement over existing methods.

The paper tackles the challenge of performing reliable Bayesian inference on big data by proposing DG-LMC, a synchronous distributed MCMC algorithm that scales provably in high-dimensional settings, as demonstrated on synthetic and real data experiments.

Performing reliable Bayesian inference on a big data scale is becoming a keystone in the modern era of machine learning. A workhorse class of methods to achieve this task are Markov chain Monte Carlo (MCMC) algorithms and their design to handle distributed datasets has been the subject of many works. However, existing methods are not completely either reliable or computationally efficient. In this paper, we propose to fill this gap in the case where the dataset is partitioned and stored on computing nodes within a cluster under a master/slaves architecture. We derive a user-friendly centralised distributed MCMC algorithm with provable scaling in high-dimensional settings. We illustrate the relevance of the proposed methodology on both synthetic and real data experiments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes