From Points to Spheres: A Geometric Reinterpretation of Variational Autoencoders
This work provides a new geometric framework for understanding VAEs, which is incremental as it complements existing probabilistic views without introducing new methods or broad performance gains.
The authors tackled the problem of interpreting Variational Autoencoders (VAEs) by proposing a geometric reinterpretation that views latent representations as Gaussian balls rather than deterministic points, showing that this perspective enhances intuitiveness and connects with VQ-VAE for a unified understanding of latent space geometry.
Variational Autoencoder is typically understood from the perspective of probabilistic inference. In this work, we propose a new geometric reinterpretation which complements the probabilistic view and enhances its intuitiveness. We demonstrate that the proper construction of semantic manifolds arises primarily from the constraining effect of the KL divergence on the encoder. We view the latent representations as a Gaussian ball rather than deterministic points. Under the constraint of KL divergence, Gaussian ball regularizes the latent space, promoting a more uniform distribution of encodings. Furthermore, we show that reparameterization establishes a critical contractual mechanism between the encoder and decoder, enabling the decoder to learn how to reconstruct from these stochastic regions. We further connect this viewpoint with VQ-VAE, offering a unified perspective: VQ-VAE can be seen as an autoencoder where encodings are constrained to a set of cluster centers, with its generative capability arising from the compactness rather than its stochasticity. This geometric framework provides a new lens for understanding how VAE shapes the latent geometry to enable effective generation.