Joint Unsupervised Learning of Deep Representations and Image Clusters
This work addresses the challenge of unsupervised image clustering and representation learning for computer vision applications, offering an incremental improvement by integrating clustering and representation learning into a single end-to-end model.
The paper tackles the problem of unsupervised learning by jointly training deep representations and image clusters using a recurrent framework, achieving state-of-the-art performance on image clustering across various datasets and showing that the learned representations generalize well to other tasks.
In this paper, we propose a recurrent framework for Joint Unsupervised LEarning (JULE) of deep representations and image clusters. In our framework, successive operations in a clustering algorithm are expressed as steps in a recurrent process, stacked on top of representations output by a Convolutional Neural Network (CNN). During training, image clusters and representations are updated jointly: image clustering is conducted in the forward pass, while representation learning in the backward pass. Our key idea behind this framework is that good representations are beneficial to image clustering and clustering results provide supervisory signals to representation learning. By integrating two processes into a single model with a unified weighted triplet loss and optimizing it end-to-end, we can obtain not only more powerful representations, but also more precise image clusters. Extensive experiments show that our method outperforms the state-of-the-art on image clustering across a variety of image datasets. Moreover, the learned representations generalize well when transferred to other tasks.