LGCONov 20, 2022

Non-reversible Parallel Tempering for Deep Posterior Approximation

arXiv:2211.10837v16 citationsh-index: 34
Originality Incremental advance
AI Analysis

This work improves computational efficiency for multi-modal posterior approximation in big data, benefiting practitioners in Bayesian statistics and machine learning, though it is incremental as it builds on existing parallel tempering methods.

The paper tackles the inefficiency of parallel tempering in big data scenarios by generalizing the deterministic even-odd scheme to promote non-reversibility and addressing bias from geometric stopping times, achieving a communication cost of O(P log P) with optimal window size and using SGD with constant learning rates for posterior approximation without extensive tuning.

Parallel tempering (PT), also known as replica exchange, is the go-to workhorse for simulations of multi-modal distributions. The key to the success of PT is to adopt efficient swap schemes. The popular deterministic even-odd (DEO) scheme exploits the non-reversibility property and has successfully reduced the communication cost from $O(P^2)$ to $O(P)$ given sufficiently many $P$ chains. However, such an innovation largely disappears in big data due to the limited chains and few bias-corrected swaps. To handle this issue, we generalize the DEO scheme to promote non-reversibility and propose a few solutions to tackle the underlying bias caused by the geometric stopping time. Notably, in big data scenarios, we obtain an appealing communication cost $O(P\log P)$ based on the optimal window size. In addition, we also adopt stochastic gradient descent (SGD) with large and constant learning rates as exploration kernels. Such a user-friendly nature enables us to conduct approximation tasks for complex posteriors without much tuning costs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes