One-Shot Object Detection without Fine-Tuning
This work addresses the limitation of limited annotated training examples for object detection, enabling detection of unseen classes with only one example, which is incremental but practical for expanding object categories.
The paper tackles the one-shot object detection problem by introducing a two-stage model that integrates metric learning with an anchor-free detection pipeline, eliminating the need for fine-tuning on support images, and it exceeds state-of-the-art performance consistently on multiple datasets.
Deep learning has revolutionized object detection thanks to large-scale datasets, but their object categories are still arguably very limited. In this paper, we attempt to enrich such categories by addressing the one-shot object detection problem, where the number of annotated training examples for learning an unseen class is limited to one. We introduce a two-stage model consisting of a first stage Matching-FCOS network and a second stage Structure-Aware Relation Module, the combination of which integrates metric learning with an anchor-free Faster R-CNN-style detection pipeline, eventually eliminating the need to fine-tune on the support images. We also propose novel training strategies that effectively improve detection performance. Extensive quantitative and qualitative evaluations were performed and our method exceeds the state-of-the-art one-shot performance consistently on multiple datasets.