Re-rank Coarse Classification with Local Region Enhanced Features for Fine-Grained Image Recognition
This addresses the challenge of integrating global and local features for fine-grained image recognition, which is incremental as it builds on existing coarse-to-fine approaches.
The paper tackles fine-grained image recognition by proposing a retrieval-based coarse-to-fine framework that re-ranks top classification results using local region enhanced features, achieving state-of-the-art performance on benchmarks like CUB-200-2011, Stanford Cars, and FGVC Aircraft.
Fine-grained image recognition is very challenging due to the difficulty of capturing both semantic global features and discriminative local features. Meanwhile, these two features are not easy to be integrated, which are even conflicting when used simultaneously. In this paper, a retrieval-based coarse-to-fine framework is proposed, where we re-rank the TopN classification results by using the local region enhanced embedding features to improve the Top1 accuracy (based on the observation that the correct category usually resides in TopN results). To obtain the discriminative regions for distinguishing the fine-grained images, we introduce a weakly-supervised method to train a box generating branch with only image-level labels. In addition, to learn more effective semantic global features, we design a multi-level loss over an automatically constructed hierarchical category structure. Experimental results show that our method achieves state-of-the-art performance on three benchmarks: CUB-200-2011, Stanford Cars, and FGVC Aircraft. Also, visualizations and analysis are provided for better understanding.