Improving Deep Image Clustering With Spatial Transformer Layers
This work addresses a specific limitation in image clustering for computer vision applications, but it is incremental as it combines existing techniques without introducing a fundamentally new approach.
The paper tackled the problem of deep image clustering being sensitive to spatial transformations like scale and rotation by integrating Spatial Transformer Networks into a deep clustering model, resulting in improved performance on MNIST and FashionMNIST datasets compared to the baseline.
Image clustering is an important but challenging task in machine learning. As in most image processing areas, the latest improvements came from models based on the deep learning approach. However, classical deep learning methods have problems to deal with spatial image transformations like scale and rotation. In this paper, we propose the use of visual attention techniques to reduce this problem in image clustering methods. We evaluate the combination of a deep image clustering model called Deep Adaptive Clustering (DAC) with the Spatial Transformer Networks (STN). The proposed model is evaluated in the datasets MNIST and FashionMNIST and outperformed the baseline model.