LGCVJul 27, 2021

Improving ClusterGAN Using Self-Augmented Information Maximization of Disentangling Latent Spaces

arXiv:2107.12706v26 citations
Originality Incremental advance
AI Analysis

This work addresses clustering in imbalanced datasets for unsupervised learning applications, representing an incremental improvement over existing methods.

The paper tackles the problem of ClusterGAN's suboptimal clustering performance due to ignoring the real conditional distribution of data, proposing SIMI-ClusterGAN which learns distinctive priors directly from data and shows improved performance on seven benchmark datasets, including imbalanced cases on MNIST.

Since their introduction in the last few years, conditional generative models have seen remarkable achievements. However, they often need the use of large amounts of labelled information. By using unsupervised conditional generation in conjunction with a clustering inference network, ClusterGAN has recently been able to achieve impressive clustering results. Since the real conditional distribution of data is ignored, the clustering inference network can only achieve inferior clustering performance by considering only uniform prior based generative samples. However, the true distribution is not necessarily balanced. Consequently, ClusterGAN fails to produce all modes, which results in sub-optimal clustering inference network performance. So, it is important to learn the prior, which tries to match the real distribution in an unsupervised way. In this paper, we propose self-augmentation information maximization improved ClusterGAN (SIMI-ClusterGAN) to learn the distinctive priors from the data directly. The proposed SIMI-ClusterGAN consists of four deep neural networks: self-augmentation prior network, generator, discriminator and clustering inference network. The proposed method has been validated using seven benchmark data sets and has shown improved performance over state-of-the art methods. To demonstrate the superiority of SIMI-ClusterGAN performance on imbalanced dataset, we have discussed two imbalanced conditions on MNIST datasets with one-class imbalance and three classes imbalanced cases. The results highlight the advantages of SIMI-ClusterGAN.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes