Variational inference via radial transport
This addresses a specific bottleneck in variational inference for practitioners, offering an incremental improvement to enhance approximation accuracy.
The paper tackles the problem of poor coverage in variational inference when using Gaussian approximations by optimizing over radial profiles, resulting in radVI, a cheap add-on to existing VI schemes with theoretical convergence guarantees.
In variational inference (VI), the practitioner approximates a high-dimensional distribution $π$ with a simple surrogate one, often a (product) Gaussian distribution. However, in many cases of practical interest, Gaussian distributions might not capture the correct radial profile of $π$, resulting in poor coverage. In this work, we approach the VI problem from the perspective of optimizing over these radial profiles. Our algorithm radVI is a cheap, effective add-on to many existing VI schemes, such as Gaussian (mean-field) VI and Laplace approximation. We provide theoretical convergence guarantees for our algorithm, owing to recent developments in optimization over the Wasserstein space--the space of probability distributions endowed with the Wasserstein distance--and new regularity properties of radial transport maps in the style of Caffarelli (2000).