Isolating Sources of Disentanglement in Variational Autoencoders
This work addresses the challenge of interpretable and controllable latent representations in unsupervised learning, which is crucial for applications in generative modeling and representation learning, though it builds incrementally on existing β-VAE methods.
The authors tackled the problem of learning disentangled representations in variational autoencoders by proposing the β-TCVAE, which refines the β-VAE objective to explicitly minimize total correlation between latent variables, and introduced a classifier-free disentanglement metric called MIG, demonstrating a strong relationship between total correlation and disentanglement in experiments.
We decompose the evidence lower bound to show the existence of a term measuring the total correlation between latent variables. We use this to motivate our $β$-TCVAE (Total Correlation Variational Autoencoder), a refinement of the state-of-the-art $β$-VAE objective for learning disentangled representations, requiring no additional hyperparameters during training. We further propose a principled classifier-free measure of disentanglement called the mutual information gap (MIG). We perform extensive quantitative and qualitative experiments, in both restricted and non-restricted settings, and show a strong relation between total correlation and disentanglement, when the latent variables model is trained using our framework.