CVNov 27, 2018

Generating Attention from Classifier Activations for Fine-grained Recognition

arXiv:1811.10770v1
Originality Incremental advance
AI Analysis

This addresses the problem of localizing objects for fine-grained recognition in a simpler way, though it is incremental as it builds on existing attention-based methods.

The paper tackled fine-grained recognition by generating attention maps from classifier activations, achieving state-of-the-art results of 87.9% on CUB-200-2011, 94.1% on Stanford Cars, and 92.1% on FGVC-Aircraft datasets.

Recent advances in fine-grained recognition utilize attention maps to localize objects of interest. Although there are many ways to generate attention maps, most of them rely on sophisticated loss functions or complex training processes. In this work, we propose a simple and straightforward attention generation model based on the output activations of classifiers. The advantage of our model is that it can be easily trained with image level labels and softmax loss functions. More specifically, multiple linear local classifiers are firstly adopted to perform fine-grained classification at each location of high level CNN feature maps. The attention map is generated by aggregating and max-pooling the output activations. Then the attention map serves as a surrogate target object mask to train those local classifiers, similar to training models for semantic segmentation. Our model achieves state-of-the-art results on three heavily benchmarked datasets, i.e. 87.9% on CUB-200-2011 dataset, 94.1% on Stanford Cars dataset and 92.1% on FGVC-Aircraft dataset, demonstrating its effectiveness on fine-grained recognition tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes