ML LGSep 30, 2019

Tightening Bounds for Variational Inference by Revisiting Perturbation Theory

Robert Bamler, Cheng Zhang, Manfred Opper, Stephan Mandt

arXiv:1910.00069v11 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of improving variational inference approximations for researchers and practitioners in machine learning, representing an incremental advancement by refining existing perturbation theory methods to ensure valid bounds.

The paper tackles the bias in variational inference approximations by deriving new corrections to the evidence lower bound (ELBO) that resemble perturbation theory but yield a valid bound, showing in experiments on Gaussian Processes and Variational Autoencoders that the new bounds improve posterior covariances and lead to higher likelihoods on held-out data.

Variational inference has become one of the most widely used methods in latent variable modeling. In its basic form, variational inference employs a fully factorized variational distribution and minimizes its KL divergence to the posterior. As the minimization can only be carried out approximately, this approximation induces a bias. In this paper, we revisit perturbation theory as a powerful way of improving the variational approximation. Perturbation theory relies on a form of Taylor expansion of the log marginal likelihood, vaguely in terms of the log ratio of the true posterior and its variational approximation. While first order terms give the classical variational bound, higher-order terms yield corrections that tighten it. However, traditional perturbation theory does not provide a lower bound, making it inapt for stochastic optimization. In this paper, we present a similar yet alternative way of deriving corrections to the ELBO that resemble perturbation theory, but that result in a valid bound. We show in experiments on Gaussian Processes and Variational Autoencoders that the new bounds are more mass covering, and that the resulting posterior covariances are closer to the true posterior and lead to higher likelihoods on held-out data.

View on arXiv PDF

Similar