Variational autoencoders in the presence of low-dimensional data: landscape and implicit bias
This addresses a key difficulty in VAE training for generative modeling, particularly for image data, by providing theoretical insights into convergence behavior, though it is incremental as it builds on prior conjectures.
The paper tackles the problem of training variational autoencoders (VAEs) on low-dimensional data, showing that for linear encoders/decoders, VAE training recovers a generator with support equal to the ground truth manifold due to gradient descent's implicit bias, while in the nonlinear case, it often learns a higher-dimensional superset.
Variational Autoencoders are one of the most commonly used generative models, particularly for image data. A prominent difficulty in training VAEs is data that is supported on a lower-dimensional manifold. Recent work by Dai and Wipf (2020) proposes a two-stage training algorithm for VAEs, based on a conjecture that in standard VAE training the generator will converge to a solution with 0 variance which is correctly supported on the ground truth manifold. They gave partial support for that conjecture by showing that some optima of the VAE loss do satisfy this property, but did not analyze the training dynamics. In this paper, we show that for linear encoders/decoders, the conjecture is true-that is the VAE training does recover a generator with support equal to the ground truth manifold-and does so due to an implicit bias of gradient descent rather than merely the VAE loss itself. In the nonlinear case, we show that VAE training frequently learns a higher-dimensional manifold which is a superset of the ground truth manifold.