CVMay 11, 2020

Fine-Grained Visual Classification with Efficient End-to-end Localization

arXiv:2005.05123v16 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of improving classification accuracy for similar classes in computer vision, but it is incremental as it builds on existing localization methods with efficiency gains.

The paper tackles fine-grained visual classification by introducing an efficient end-to-end localization module that avoids multiple iterations or complex training, achieving competitive recognition performance on benchmark datasets like CUB200-2011, Stanford Cars, and FGVC-Aircraft.

The term fine-grained visual classification (FGVC) refers to classification tasks where the classes are very similar and the classification model needs to be able to find subtle differences to make the correct prediction. State-of-the-art approaches often include a localization step designed to help a classification network by localizing the relevant parts of the input images. However, this usually requires multiple iterations or passes through a full classification network or complex training schedules. In this work we present an efficient localization module that can be fused with a classification network in an end-to-end setup. On the one hand the module is trained by the gradient flowing back from the classification network. On the other hand, two self-supervised loss functions are introduced to increase the localization accuracy. We evaluate the new model on the three benchmark datasets CUB200-2011, Stanford Cars and FGVC-Aircraft and are able to achieve competitive recognition performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes