ML LG COMar 20, 2024

Variational Inference for Uncertainty Quantification: an Analysis of Trade-offs

Charles C. Margossian, Loucas Pillaud-Vivien, Lawrence K. Saul

arXiv:2403.13748v518.310 citationsh-index: 14Has Code

Originality Incremental advance

AI Analysis

This work addresses a fundamental limitation in variational inference for practitioners in machine learning and statistics, providing an incremental analysis of trade-offs in uncertainty estimation.

The paper tackles the problem of variational inference (VI) for uncertainty quantification, showing that factorized approximations cannot simultaneously estimate multiple uncertainty measures when the target distribution has non-diagonal covariance, and analyzes how different divergences affect which measures are correctly estimated, with empirical validation on non-Gaussian cases.

Given an intractable distribution $p$, the problem of variational inference (VI) is to find the best approximation from some more tractable family $Q$. Commonly, one chooses $Q$ to be a family of factorized distributions (i.e., the mean-field assumption), even though $p$ itself does not factorize. We show that this mismatch can lead to an impossibility theorem: if $p$ does not factorize and furthermore has a non-diagonal covariance matrix, then any factorized approximation $q\!\in\!Q$ can correctly estimate at most one of the following three measures of uncertainty: (i) the marginal variances, (ii) the marginal precisions, or (iii) the generalized variance (which for elliptical distributions is closely related to the entropy). In practice, the best variational approximation in $Q$ is found by minimizing some divergence $D(q,p)$ between distributions, and so we ask: how does the choice of divergence determine which measure of uncertainty, if any, is correctly estimated by VI? We consider the classic Kullback-Leibler divergences, the more general $α$-divergences, and a score-based divergence which compares $\nabla \log p$ and $\nabla \log q$. We thoroughly analyze the case where $p$ is a Gaussian and $q$ is a (factorized) Gaussian. In this setting, we show that all the considered divergences can be ordered based on the estimates of uncertainty they yield as objective functions for VI. Finally, we empirically evaluate the validity of this ordering when the target distribution $p$ is not Gaussian.

View on arXiv PDF Code

Similar