CVOct 21, 2024

Few-shot target-driven instance detection based on open-vocabulary object detection models

arXiv:2410.16028v1h-index: 4
Originality Incremental advance
AI Analysis

This work addresses the need for efficient few-shot object detection in computer vision, offering a practical solution that is incremental in leveraging existing open-vocabulary models.

The paper tackles the problem of costly gradient-based retraining for few-shot object recognition by proposing a lightweight method that adapts open-vocabulary object detection models into one-shot or few-shot models without textual descriptions, with experiments showing performance improvements based on model size, example count, and image augmentation.

Current large open vision models could be useful for one and few-shot object recognition. Nevertheless, gradient-based re-training solutions are costly. On the other hand, open-vocabulary object detection models bring closer visual and textual concepts in the same latent space, allowing zero-shot detection via prompting at small computational cost. We propose a lightweight method to turn the latter into a one-shot or few-shot object recognition models without requiring textual descriptions. Our experiments on the TEgO dataset using the YOLO-World model as a base show that performance increases with the model size, the number of examples and the use of image augmentation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes