CVLGJan 7, 2021

Combining pretrained CNN feature extractors to enhance clustering of complex natural images

arXiv:2101.02767v150 citations
Originality Incremental advance
AI Analysis

This work provides a method to improve image clustering for researchers and practitioners by combining features from multiple pretrained CNNs, addressing the challenge of selecting a single optimal feature extractor.

This paper investigates the impact of pretrained CNN architectures on image clustering performance, finding that the choice of architecture significantly affects results and that optimal selection is difficult. To address this, they propose a multi-view clustering approach using features from different CNNs as distinct views, achieving state-of-the-art results on nine natural image datasets.

Recently, a common starting point for solving complex unsupervised image classification tasks is to use generic features, extracted with deep Convolutional Neural Networks (CNN) pretrained on a large and versatile dataset (ImageNet). However, in most research, the CNN architecture for feature extraction is chosen arbitrarily, without justification. This paper aims at providing insight on the use of pretrained CNN features for image clustering (IC). First, extensive experiments are conducted and show that, for a given dataset, the choice of the CNN architecture for feature extraction has a huge impact on the final clustering. These experiments also demonstrate that proper extractor selection for a given IC task is difficult. To solve this issue, we propose to rephrase the IC problem as a multi-view clustering (MVC) problem that considers features extracted from different architectures as different "views" of the same data. This approach is based on the assumption that information contained in the different CNN may be complementary, even when pretrained on the same data. We then propose a multi-input neural network architecture that is trained end-to-end to solve the MVC problem effectively. This approach is tested on nine natural image datasets, and produces state-of-the-art results for IC.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes