LGMLApr 19, 2013

Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget

arXiv:1304.5299v4249 citations
Originality Incremental advance
AI Analysis

This addresses the problem of high computational costs in MCMC sampling for large datasets, offering a domain-specific improvement that is incremental in nature.

The paper tackles the computational inefficiency of Bayesian posterior MCMC sampling on large datasets by introducing an approximate Metropolis-Hastings rule based on sequential hypothesis testing, which reduces data usage per decision and increases sampling speed, though it introduces a controlled asymptotic bias.

Can we make Bayesian posterior MCMC sampling more efficient when faced with very large datasets? We argue that computing the likelihood for N datapoints in the Metropolis-Hastings (MH) test to reach a single binary decision is computationally inefficient. We introduce an approximate MH rule based on a sequential hypothesis test that allows us to accept or reject samples with high confidence using only a fraction of the data required for the exact MH rule. While this method introduces an asymptotic bias, we show that this bias can be controlled and is more than offset by a decrease in variance due to our ability to draw more samples per unit of time.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes