Classification-Specific Parts for Improving Fine-Grained Visual Categorization
This work addresses the challenge of distinguishing visually similar categories in fine-grained classification, offering an incremental improvement over existing part-based methods by automating part selection and scale determination.
The paper tackles fine-grained visual categorization by proposing a classification-specific part estimation method that uses initial predictions and gradient-based feature importance to automatically detect relevant image regions with spatial extent, achieving improved performance on multiple datasets.
Fine-grained visual categorization is a classification task for distinguishing categories with high intra-class and small inter-class variance. While global approaches aim at using the whole image for performing the classification, part-based solutions gather additional local information in terms of attentions or parts. We propose a novel classification-specific part estimation that uses an initial prediction as well as back-propagation of feature importance via gradient computations in order to estimate relevant image regions. The subsequently detected parts are then not only selected by a-posteriori classification knowledge, but also have an intrinsic spatial extent that is determined automatically. This is in contrast to most part-based approaches and even to available ground-truth part annotations, which only provide point coordinates and no additional scale information. We show in our experiments on various widely-used fine-grained datasets the effectiveness of the mentioned part selection method in conjunction with the extracted part features.