Measuring the reliability of MCMC inference with bidirectional Monte Carlo
This addresses the problem of unreliable inference evaluation for researchers and practitioners using probabilistic programming, though it is incremental as it builds on existing bidirectional Monte Carlo techniques.
The paper tackled the challenge of measuring the quality of MCMC-based posterior inference by extending bidirectional Monte Carlo to upper bound the symmetrized KL divergence between true and approximate posteriors, integrating this into WebPPL and Stan to validate simulated data experiments and guide algorithm design.
Markov chain Monte Carlo (MCMC) is one of the main workhorses of probabilistic inference, but it is notoriously hard to measure the quality of approximate posterior samples. This challenge is particularly salient in black box inference methods, which can hide details and obscure inference failures. In this work, we extend the recently introduced bidirectional Monte Carlo technique to evaluate MCMC-based posterior inference algorithms. By running annealed importance sampling (AIS) chains both from prior to posterior and vice versa on simulated data, we upper bound in expectation the symmetrized KL divergence between the true posterior distribution and the distribution of approximate samples. We present Bounding Divergences with REverse Annealing (BREAD), a protocol for validating the relevance of simulated data experiments to real datasets, and integrate it into two probabilistic programming languages: WebPPL and Stan. As an example of how BREAD can be used to guide the design of inference algorithms, we apply it to study the effectiveness of different model representations in both WebPPL and Stan.