CVOct 13, 2023

Incremental Object Detection with CLIP

arXiv:2310.08815v32 citationsh-index: 32
Originality Incremental advance
AI Analysis

This work addresses incremental object detection for computer vision applications, offering an incremental improvement by enhancing forward compatibility and detection performance.

The paper tackles the problem of incremental object detection, where data ambiguity from differently labeled bounding boxes across learning stages impairs learning new classes, by leveraging CLIP to generate text feature embeddings and using super-classes to simulate incremental scenarios, resulting in outperforming state-of-the-art methods on the PASCAL VOC 2007 dataset, especially for new classes.

In contrast to the incremental classification task, the incremental detection task is characterized by the presence of data ambiguity, as an image may have differently labeled bounding boxes across multiple continuous learning stages. This phenomenon often impairs the model's ability to effectively learn new classes. However, existing research has paid less attention to the forward compatibility of the model, which limits its suitability for incremental learning. To overcome this obstacle, we propose leveraging a visual-language model such as CLIP to generate text feature embeddings for different class sets, which enhances the feature space globally. We then employ super-classes to replace the unavailable novel classes in the early learning stage to simulate the incremental scenario. Finally, we utilize the CLIP image encoder to accurately identify potential objects. We incorporate the finely recognized detection boxes as pseudo-annotations into the training process, thereby further improving the detection performance. We evaluate our approach on various incremental learning settings using the PASCAL VOC 2007 dataset, and our approach outperforms state-of-the-art methods, particularly for recognizing the new classes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes