MLLGFeb 1

Importance Weighted Variational Inference without the Reparameterization Trick

arXiv:2602.01412v1
Originality Highly original
AI Analysis

This work addresses a bottleneck in variational inference for researchers and practitioners dealing with non-reparameterizable models, offering a theoretically justified alternative to improve optimization in challenging scenarios.

The paper tackled the problem of optimizing importance weighted variational inference without relying on the reparameterization trick, which restricts model choices, by analyzing REINFORCE gradient estimators and introducing VIMCO-* to avoid vanishing signal-to-noise ratios. The result was a novel gradient estimator that achieves a √N SNR scaling and demonstrates superior empirical performance in settings where reparameterized gradients are unavailable.

Importance weighted variational inference (VI) approximates densities known up to a normalizing constant by optimizing bounds that tighten with the number of Monte Carlo samples $N$. Standard optimization relies on reparameterized gradient estimators, which are well-studied theoretically yet restrict both the choice of the data-generating process and the variational approximation. While REINFORCE gradient estimators do not suffer from such restrictions, they lack rigorous theoretical justification. In this paper, we provide the first comprehensive analysis of REINFORCE gradient estimators in importance weighted VI, leveraging this theoretical foundation to diagnose and resolve fundamental deficiencies in current state-of-the-art estimators. Specifically, we introduce and examine a generalized family of variational inference for Monte Carlo objectives (VIMCO) gradient estimators. We prove that state-of-the-art VIMCO gradient estimators exhibit a vanishing signal-to-noise ratio (SNR) as $N$ increases, which prevents effective optimization. To overcome this issue, we propose the novel VIMCO-$\star$ gradient estimator and show that it averts the SNR collapse of existing VIMCO gradient estimators by achieving a $\sqrt{N}$ SNR scaling instead. We demonstrate its superior empirical performance compared to current VIMCO implementations in challenging settings where reparameterized gradients are typically unavailable.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes