On the Necessity and Effectiveness of Learning the Prior of Variational Auto-Encoder
This addresses a fundamental issue in variational inference for machine learning practitioners, offering a more efficient alternative to complex hierarchical models.
The paper tackles the problem of mismatched aggregated posterior and unit Gaussian prior in variational auto-encoders, proving that learning the prior is necessary and effective, and achieves test negative log-likelihood comparable to state-of-the-art hierarchical VAEs using a simpler architecture.
Using powerful posterior distributions is a popular approach to achieving better variational inference. However, recent works showed that the aggregated posterior may fail to match unit Gaussian prior, thus learning the prior becomes an alternative way to improve the lower-bound. In this paper, for the first time in the literature, we prove the necessity and effectiveness of learning the prior when aggregated posterior does not match unit Gaussian prior, analyze why this situation may happen, and propose a hypothesis that learning the prior may improve reconstruction loss, all of which are supported by our extensive experiment results. We show that using learned Real NVP prior and just one latent variable in VAE, we can achieve test NLL comparable to very deep state-of-the-art hierarchical VAE, outperforming many previous works with complex hierarchical VAE architectures.