Disentangling to Cluster: Gaussian Mixture Variational Ladder Autoencoders
This addresses the challenge of hierarchical clustering in datasets with multiple attributes, though it is incremental as it builds on Variational Ladder Autoencoders.
The paper tackled the problem of clustering data with multiple attributes by proposing a disentangled clustering method, VLAC, which outperformed a Gaussian Mixture DGM with higher cluster accuracy on the SVHN test set.
In clustering we normally output one cluster variable for each datapoint. However it is not necessarily the case that there is only one way to partition a given dataset into cluster components. For example, one could cluster objects by their colour, or by their type. Different attributes form a hierarchy, and we could wish to cluster in any of them. By disentangling the learnt latent representations of some dataset into different layers for different attributes we can then cluster in those latent spaces. We call this "disentangled clustering". Extending Variational Ladder Autoencoders (Zhao et al., 2017), we propose a clustering algorithm, VLAC, that outperforms a Gaussian Mixture DGM in cluster accuracy over digit identity on the test set of SVHN. We also demonstrate learning clusters jointly over numerous layers of the hierarchy of latent variables for the data, and show component-wise generation from this hierarchical model.