CVDec 5, 2019

Multi-Modal Deep Clustering: Unsupervised Partitioning of Images

arXiv:1912.02678v318 citations
Originality Incremental advance
AI Analysis

This addresses the problem of clustering unlabeled images for computer vision researchers, offering incremental improvements over existing deep learning methods.

The paper tackles unsupervised image clustering by proposing Multi-Modal Deep Clustering (MMDC), which trains a deep network with a Gaussian Mixture Model alignment and a self-supervised rotation task, achieving state-of-the-art results such as 82% accuracy on CIFAR-10 and up to 20% absolute improvements on benchmarks.

The clustering of unlabeled raw images is a daunting task, which has recently been approached with some success by deep learning methods. Here we propose an unsupervised clustering framework, which learns a deep neural network in an end-to-end fashion, providing direct cluster assignments of images without additional processing. Multi-Modal Deep Clustering (MMDC), trains a deep network to align its image embeddings with target points sampled from a Gaussian Mixture Model distribution. The cluster assignments are then determined by mixture component association of image embeddings. Simultaneously, the same deep network is trained to solve an additional self-supervised task of predicting image rotations. This pushes the network to learn more meaningful image representations that facilitate a better clustering. Experimental results show that MMDC achieves or exceeds state-of-the-art performance on six challenging benchmarks. On natural image datasets we improve on previous results with significant margins of up to 20% absolute accuracy points, yielding an accuracy of 82% on CIFAR-10, 45% on CIFAR-100 and 69% on STL-10.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes