LGAIMLJun 13, 2022

Markov Chain Score Ascent: A Unifying Framework of Variational Inference with Markovian Gradients

arXiv:2206.06295v410 citationsh-index: 23
Originality Incremental advance
AI Analysis

This work addresses a key bottleneck in variational inference for machine learning practitioners, offering incremental improvements in theoretical understanding and algorithm design.

The paper tackles the challenge of minimizing the inclusive KL divergence with SGD by analyzing Markov chain score ascent (MCSA) methods, providing the first non-asymptotic convergence analysis and developing a novel parallel scheme (pMCSA) that achieves a tighter gradient variance bound and superior empirical performance.

Minimizing the inclusive Kullback-Leibler (KL) divergence with stochastic gradient descent (SGD) is challenging since its gradient is defined as an integral over the posterior. Recently, multiple methods have been proposed to run SGD with biased gradient estimates obtained from a Markov chain. This paper provides the first non-asymptotic convergence analysis of these methods by establishing their mixing rate and gradient variance. To do this, we demonstrate that these methods-which we collectively refer to as Markov chain score ascent (MCSA) methods-can be cast as special cases of the Markov chain gradient descent framework. Furthermore, by leveraging this new understanding, we develop a novel MCSA scheme, parallel MCSA (pMCSA), that achieves a tighter bound on the gradient variance. We demonstrate that this improved theoretical result translates to superior empirical performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes