On the Convergence of Stochastic Variational Inference in Bayesian Networks
This work addresses a convergence issue in Bayesian inference for researchers, but it is incremental as it builds on existing methods.
The paper identifies a pitfall in applying stochastic variational inference to Bayesian networks, showing that the beneficial scaling of initial step sizes is lost when approximations factorize, and provides experimental analysis of the trade-off between well-scaled steps and exact gradients.
We highlight a pitfall when applying stochastic variational inference to general Bayesian networks. For global random variables approximated by an exponential family distribution, natural gradient steps, commonly starting from a unit length step size, are averaged to convergence. This useful insight into the scaling of initial step sizes is lost when the approximation factorizes across a general Bayesian network, and care must be taken to ensure practical convergence. We experimentally investigate how much of the baby (well-scaled steps) is thrown out with the bath water (exact gradients).