Local Disentanglement in Variational Auto-Encoders Using Jacobian $L_1$ Regularization
This work addresses model identification issues in variational auto-encoders for researchers in representation learning, offering an incremental improvement to enhance disentanglement in image data.
The paper tackles the problem of latent space rotations in unsupervised representation learning by proposing an L1 loss on the VAE's generative Jacobian to encourage local alignment of latent variables with independent factors of variation in images. The result is improved local disentanglement, demonstrated qualitatively and quantitatively on various datasets using information theoretic and modularity measures.
There have been many recent advances in representation learning; however, unsupervised representation learning can still struggle with model identification issues related to rotations of the latent space. Variational Auto-Encoders (VAEs) and their extensions such as $β$-VAEs have been shown to improve local alignment of latent variables with PCA directions, which can help to improve model disentanglement under some conditions. Borrowing inspiration from Independent Component Analysis (ICA) and sparse coding, we propose applying an $L_1$ loss to the VAE's generative Jacobian during training to encourage local latent variable alignment with independent factors of variation in images of multiple objects or images with multiple parts. We demonstrate our results on a variety of datasets, giving qualitative and quantitative results using information theoretic and modularity measures that show our added $L_1$ cost encourages local axis alignment of the latent representation with individual factors of variation.