Variational Inference via $χ$-Upper Bound Minimization
This addresses a key limitation in variational inference for practitioners in Bayesian statistics, offering a method to better capture posterior uncertainty, though it is incremental as it builds on existing divergence-based approaches.
The paper tackles the problem of underestimating posterior variance in variational inference by proposing CHIVI, an algorithm that minimizes the χ-divergence from the posterior to the approximating distribution, leading to improved error rates and more accurate variance estimates in models like probit regression and Gaussian process classification.
Variational inference (VI) is widely used as an efficient alternative to Markov chain Monte Carlo. It posits a family of approximating distributions $q$ and finds the closest member to the exact posterior $p$. Closeness is usually measured via a divergence $D(q || p)$ from $q$ to $p$. While successful, this approach also has problems. Notably, it typically leads to underestimation of the posterior variance. In this paper we propose CHIVI, a black-box variational inference algorithm that minimizes $D_χ(p || q)$, the $χ$-divergence from $p$ to $q$. CHIVI minimizes an upper bound of the model evidence, which we term the $χ$ upper bound (CUBO). Minimizing the CUBO leads to improved posterior uncertainty, and it can also be used with the classical VI lower bound (ELBO) to provide a sandwich estimate of the model evidence. We study CHIVI on three models: probit regression, Gaussian process classification, and a Cox process model of basketball plays. When compared to expectation propagation and classical VI, CHIVI produces better error rates and more accurate estimates of posterior variance.