CVAIOct 10, 2022

FS-DETR: Few-Shot DEtection TRansformer with prompting and without re-training

arXiv:2210.04845v261 citationsh-index: 32
AI Analysis

This addresses the problem of detecting novel objects with minimal examples for practical applications, representing a significant advance over existing methods.

The paper tackles few-shot object detection by introducing FS-DETR, a transformer-based method that uses visual prompting without requiring fine-tuning at test time. It achieves state-of-the-art accuracy, matching or outperforming fine-tuning methods on PASCAL VOC and MSCOCO benchmarks.

This paper is on Few-Shot Object Detection (FSOD), where given a few templates (examples) depicting a novel class (not seen during training), the goal is to detect all of its occurrences within a set of images. From a practical perspective, an FSOD system must fulfil the following desiderata: (a) it must be used as is, without requiring any fine-tuning at test time, (b) it must be able to process an arbitrary number of novel objects concurrently while supporting an arbitrary number of examples from each class and (c) it must achieve accuracy comparable to a closed system. Towards satisfying (a)-(c), in this work, we make the following contributions: We introduce, for the first time, a simple, yet powerful, few-shot detection transformer (FS-DETR) based on visual prompting that can address both desiderata (a) and (b). Our system builds upon the DETR framework, extending it based on two key ideas: (1) feed the provided visual templates of the novel classes as visual prompts during test time, and (2) ``stamp'' these prompts with pseudo-class embeddings (akin to soft prompting), which are then predicted at the output of the decoder. Importantly, we show that our system is not only more flexible than existing methods, but also, it makes a step towards satisfying desideratum (c). Specifically, it is significantly more accurate than all methods that do not require fine-tuning and even matches and outperforms the current state-of-the-art fine-tuning based methods on the most well-established benchmarks (PASCAL VOC & MSCOCO).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes