CVAINov 1, 2023

Re-Scoring Using Image-Language Similarity for Few-Shot Object Detection

arXiv:2311.00278v124 citationsh-index: 3Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of detecting novel objects with limited labeled data, which is an incremental improvement in the domain of computer vision.

The paper tackles few-shot object detection by introducing a method that re-scores classification scores using CLIP and a modified loss function, achieving substantial performance improvements over state-of-the-art approaches on MS-COCO and PASCAL VOC datasets.

Few-shot object detection, which focuses on detecting novel objects with few labels, is an emerging challenge in the community. Recent studies show that adapting a pre-trained model or modified loss function can improve performance. In this paper, we explore leveraging the power of Contrastive Language-Image Pre-training (CLIP) and hard negative classification loss in low data setting. Specifically, we propose Re-scoring using Image-language Similarity for Few-shot object detection (RISF) which extends Faster R-CNN by introducing Calibration Module using CLIP (CM-CLIP) and Background Negative Re-scale Loss (BNRL). The former adapts CLIP, which performs zero-shot classification, to re-score the classification scores of a detector using image-class similarities, the latter is modified classification loss considering the punishment for fake backgrounds as well as confusing categories on a generalized few-shot object detection dataset. Extensive experiments on MS-COCO and PASCAL VOC show that the proposed RISF substantially outperforms the state-of-the-art approaches. The code will be available.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes