Amortized Variational Inference: When and Why?
This work addresses the theoretical limitations of A-VI for researchers in probabilistic modeling, providing insights into its applicability but is incremental as it builds on existing variational inference methods.
The paper investigates when amortized variational inference (A-VI) can match the optimal solution of factorized variational inference (F-VI) in latent variable models, deriving necessary and sufficient conditions for closing the amortization gap, which are verified for simple hierarchical models but shown to be unattainable in others like hidden Markov models.
In a probabilistic latent variable model, factorized (or mean-field) variational inference (F-VI) fits a separate parametric distribution for each latent variable. Amortized variational inference (A-VI) instead learns a common inference function, which maps each observation to its corresponding latent variable's approximate posterior. Typically, A-VI is used as a step in the training of variational autoencoders, however it stands to reason that A-VI could also be used as a general alternative to F-VI. In this paper we study when and why A-VI can be used for approximate Bayesian inference. We derive conditions on a latent variable model which are necessary, sufficient, and verifiable under which A-VI can attain F-VI's optimal solution, thereby closing the amortization gap. We prove these conditions are uniquely verified by simple hierarchical models, a broad class that encompasses many models in machine learning. We then show, on a broader class of models, how to expand the domain of AVI's inference function to improve its solution, and we provide examples, e.g. hidden Markov models, where the amortization gap cannot be closed.