Bures-Wasserstein Importance-Weighted Evidence Lower Bound: Exposition and Applications

Peiwen Jiang, Takuo Matsubara, Minh-Ngoc Tran

arXiv:2602.04272v11.2

Originality Incremental advance

AI Analysis

This addresses a stability issue in variational inference for practitioners, though it is incremental as it builds on existing IW-ELBO methods with a geometric twist.

The paper tackles the problem of inefficient optimization of the Importance-Weighted Evidence Lower Bound (IW-ELBO) in variational inference due to vanishing gradient signal-to-noise ratio (SNR) in Euclidean space, by formulating optimization in Bures-Wasserstein space and proving that the Wasserstein gradient SNR scales as Ω(√K) for large importance samples K, leading to superior approximation performance in experiments.

The Importance-Weighted Evidence Lower Bound (IW-ELBO) has emerged as an effective objective for variational inference (VI), tightening the standard ELBO and mitigating the mode-seeking behaviour. However, optimizing the IW-ELBO in Euclidean space is often inefficient, as its gradient estimators suffer from a vanishing signal-to-noise ratio (SNR). This paper formulates the optimisation of the IW-ELBO in Bures-Wasserstein space, a manifold of Gaussian distributions equipped with the 2-Wasserstein metric. We derive the Wasserstein gradient of the IW-ELBO and project it onto the Bures-Wasserstein space to yield a tractable algorithm for Gaussian VI. A pivotal contribution of our analysis concerns the stability of the gradient estimator. While the SNR of the standard Euclidean gradient estimator is known to vanish as the number of importance samples $K$ increases, we prove that the SNR of the Wasserstein gradient scales favourably as $Ω(\sqrt{K})$, ensuring optimisation efficiency even for large $K$. We further extend this geometric analysis to the Variational Rényi Importance-Weighted Autoencoder bound, establishing analogous stability guarantees. Experiments demonstrate that the proposed framework achieves superior approximation performance compared to other baselines.

View on arXiv PDF

Similar