CVLGMLMar 2, 2020

Weakly-supervised Object Localization for Few-shot Learning and Fine-grained Few-shot Learning

arXiv:2003.00874v32 citations
AI Analysis

This addresses the problem of learning from very few samples for both general and fine-grained visual categories, which is crucial for real-world applications where data is scarce, representing an incremental improvement over existing methods.

The paper tackles the challenge of few-shot learning (FSL) and fine-grained FSL by proposing a method that uses weakly-supervised object localization to identify discriminative regions, resulting in state-of-the-art performance on benchmark datasets, especially for fine-grained tasks, with demonstrated superior generalization across different datasets.

Few-shot learning (FSL) aims to learn novel visual categories from very few samples, which is a challenging problem in real-world applications. Many methods of few-shot classification work well on general images to learn global representation. However, they can not deal with fine-grained categories well at the same time due to a lack of subtle and local information. We argue that localization is an efficient approach because it directly provides the discriminative regions, which is critical for both general classification and fine-grained classification in a low data regime. In this paper, we propose a Self-Attention Based Complementary Module (SAC Module) to fulfill the weakly-supervised object localization, and more importantly produce the activated masks for selecting discriminative deep descriptors for few-shot classification. Based on each selected deep descriptor, Semantic Alignment Module (SAM) calculates the semantic alignment distance between the query and support images to boost classification performance. Extensive experiments show our method outperforms the state-of-the-art methods on benchmark datasets under various settings, especially on the fine-grained few-shot tasks. Besides, our method achieves superior performance over previous methods when training the model on miniImageNet and evaluating it on the different datasets, demonstrating its superior generalization capacity. Extra visualization shows the proposed method can localize the key objects more interval.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes