Markovian Score Climbing: Variational Inference with KL(p||q)
This addresses the problem of biased estimates in variational inference for researchers and practitioners, offering a more reliable method for probabilistic inference in models like Bayesian probit regression and stochastic volatility, though it appears incremental as it builds on prior work on inclusive KL.
The paper tackled the challenge of minimizing the inclusive KL divergence KL(p||q) in variational inference by developing Markovian score climbing (MSC), a simple algorithm that uses stochastic gradients with vanishing bias to converge to a local optimum without the systematic errors of existing methods like Reweighted Wake-Sleep and Neural Adaptive Sequential Monte Carlo.
Modern variational inference (VI) uses stochastic gradients to avoid intractable expectations, enabling large-scale probabilistic inference in complex models. VI posits a family of approximating distributions q and then finds the member of that family that is closest to the exact posterior p. Traditionally, VI algorithms minimize the "exclusive Kullback-Leibler (KL)" KL(q || p), often for computational convenience. Recent research, however, has also focused on the "inclusive KL" KL(p || q), which has good statistical properties that makes it more appropriate for certain inference problems. This paper develops a simple algorithm for reliably minimizing the inclusive KL using stochastic gradients with vanishing bias. This method, which we call Markovian score climbing (MSC), converges to a local optimum of the inclusive KL. It does not suffer from the systematic errors inherent in existing methods, such as Reweighted Wake-Sleep and Neural Adaptive Sequential Monte Carlo, which lead to bias in their final estimates. We illustrate convergence on a toy model and demonstrate the utility of MSC on Bayesian probit regression for classification as well as a stochastic volatility model for financial data.