LG CVMay 16, 2023

ProtoVAE: Prototypical Networks for Unsupervised Disentanglement

Vaishnavi Patil, Matthew Evanusa, Joseph JaJa

arXiv:2305.09092v12.0

Originality Incremental advance

AI Analysis

This addresses the problem of learning interpretable representations without labels for researchers in generative modeling, though it appears incremental as it builds on existing VAE and metric learning methods.

The paper tackles unsupervised disentanglement by introducing ProtoVAE, a VAE-based model that uses a prototypical network to enforce constraints for interpretable latent representations, achieving state-of-the-art results on benchmark datasets like dSprites and 3DShapes with quantitative metrics.

Generative modeling and self-supervised learning have in recent years made great strides towards learning from data in a completely unsupervised way. There is still however an open area of investigation into guiding a neural network to encode the data into representations that are interpretable or explainable. The problem of unsupervised disentanglement is of particular importance as it proposes to discover the different latent factors of variation or semantic concepts from the data alone, without labeled examples, and encode them into structurally disjoint latent representations. Without additional constraints or inductive biases placed in the network, a generative model may learn the data distribution and encode the factors, but not necessarily in a disentangled way. Here, we introduce a novel deep generative VAE-based model, ProtoVAE, that leverages a deep metric learning Prototypical network trained using self-supervision to impose these constraints. The prototypical network constrains the mapping of the representation space to data space to ensure that controlled changes in the representation space are mapped to changes in the factors of variations in the data space. Our model is completely unsupervised and requires no a priori knowledge of the dataset, including the number of factors. We evaluate our proposed model on the benchmark dSprites, 3DShapes, and MPI3D disentanglement datasets, showing state of the art results against previous methods via qualitative traversals in the latent space, as well as quantitative disentanglement metrics. We further qualitatively demonstrate the effectiveness of our model on the real-world CelebA dataset.

View on arXiv PDF

Similar