CVApr 2, 2022

Matching Feature Sets for Few-Shot Image Classification

arXiv:2204.00949v1107 citationsh-index: 69
Originality Highly original
AI Analysis

This work addresses the problem of improving few-shot image classification accuracy for researchers and practitioners by introducing a set-based representation approach, which is incremental as it builds on existing encoder architectures.

The paper tackles few-shot image classification by proposing to extract sets of feature vectors per image instead of a single vector, arguing this builds richer representations for better transfer to few-shot classes. The method, SetFeat, adapts existing encoders with lightweight self-attention modules and uses set-to-set matching, outperforming state-of-the-art on standard datasets like miniImageNet, tieredImageNet, and CUB in most 1- and 5-shot scenarios.

In image classification, it is common practice to train deep networks to extract a single feature vector per input image. Few-shot classification methods also mostly follow this trend. In this work, we depart from this established direction and instead propose to extract sets of feature vectors for each image. We argue that a set-based representation intrinsically builds a richer representation of images from the base classes, which can subsequently better transfer to the few-shot classes. To do so, we propose to adapt existing feature extractors to instead produce sets of feature vectors from images. Our approach, dubbed SetFeat, embeds shallow self-attention mechanisms inside existing encoder architectures. The attention modules are lightweight, and as such our method results in encoders that have approximately the same number of parameters as their original versions. During training and inference, a set-to-set matching metric is used to perform image classification. The effectiveness of our proposed architecture and metrics is demonstrated via thorough experiments on standard few-shot datasets -- namely miniImageNet, tieredImageNet, and CUB -- in both the 1- and 5-shot scenarios. In all cases but one, our method outperforms the state-of-the-art.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes