Disentangling and Learning Robust Representations with Natural Clustering
This addresses representation learning for generalization in deep models, but it is incremental as it builds on existing variational autoencoder frameworks.
The paper tackles the problem of learning disentangled representations when generative factors have multimodal distributions due to class distinctions, proposing N-VAE to separate class-dependent and shared factors, resulting in capabilities for detecting disentangled factors and generating novel samples.
Learning representations that disentangle the underlying factors of variability in data is an intuitive way to achieve generalization in deep models. In this work, we address the scenario where generative factors present a multimodal distribution due to the existence of class distinction in the data. We propose N-VAE, a model which is capable of separating factors of variation which are exclusive to certain classes from factors that are shared among classes. This model implements an explicitly compositional latent variable structure by defining a class-conditioned latent space and a shared latent space. We show its usefulness for detecting and disentangling class-dependent generative factors as well as its capacity to generate artificial samples which contain characteristics unseen in the training data.