Mixture weights optimisation for Alpha-Divergence Variational Inference
This work addresses a theoretical bottleneck in variational inference for researchers, but it is incremental as it builds on existing α-divergence methods.
The paper tackled the optimization of mixture weights for α-divergence variational inference, establishing a full proof of convergence for the Power Descent algorithm when α<1 and extending it to α=1 as an Entropic Mirror Descent, with numerical comparisons showing potential advantages.
This paper focuses on $α$-divergence minimisation methods for Variational Inference. More precisely, we are interested in algorithms optimising the mixture weights of any given mixture model, without any information on the underlying distribution of its mixture components parameters. The Power Descent, defined for all $α\neq 1$, is one such algorithm and we establish in our work the full proof of its convergence towards the optimal mixture weights when $α<1$. Since the $α$-divergence recovers the widely-used forward Kullback-Leibler when $α\to 1$, we then extend the Power Descent to the case $α= 1$ and show that we obtain an Entropic Mirror Descent. This leads us to investigate the link between Power Descent and Entropic Mirror Descent: first-order approximations allow us to introduce the Renyi Descent, a novel algorithm for which we prove an $O(1/N)$ convergence rate. Lastly, we compare numerically the behavior of the unbiased Power Descent and of the biased Renyi Descent and we discuss the potential advantages of one algorithm over the other.