LGSPDec 17, 2021

Robust Distributed Bayesian Learning with Stragglers via Consensus Monte Carlo

arXiv:2112.09794v2
Originality Incremental advance
AI Analysis

This addresses straggler mitigation in distributed Bayesian learning, which is an incremental improvement over existing consensus Monte Carlo methods.

The paper tackles the problem of stragglers in distributed Bayesian learning by generalizing consensus Monte Carlo with grouping and coding methods, showing that Coded CMC outperforms Group-based CMC for small worker counts while Group-based CMC is better for larger setups.

This paper studies distributed Bayesian learning in a setting encompassing a central server and multiple workers by focusing on the problem of mitigating the impact of stragglers. The standard one-shot, or embarrassingly parallel, Bayesian learning protocol known as consensus Monte Carlo (CMC) is generalized by proposing two straggler-resilient solutions based on grouping and coding. Two main challenges in designing straggler-resilient algorithms for CMC are the need to estimate the statistics of the workers' outputs across multiple shots, and the joint non-linear post-processing of the outputs of the workers carried out at the server. This is in stark contrast to other distributed settings like gradient coding, which only require the per-shot sum of the workers' outputs. The proposed methods, referred to as Group-based CMC (G-CMC) and Coded CMC (C-CMC), leverage redundant computing at the workers in order to enable the estimation of global posterior samples at the server based on partial outputs from the workers. Simulation results show that C-CMC may outperform G-CMC for a small number of workers, while G-CMC is generally preferable for a larger number of workers.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes