CVJul 15, 2024

Anticipating Future Object Compositions without Forgetting

arXiv:2407.10723v21 citationsh-index: 19
Originality Incremental advance
AI Analysis

This addresses the challenge of compositional zero-shot learning in object detection for computer vision applications, representing an incremental advancement.

The paper tackles the problem of generalizing object detection to novel object-attribute compositions without forgetting prior knowledge, achieving a 70.5% improvement in harmonic mean over a baseline on the CLEVR dataset and a 14.5% increase across multiple sets.

Despite the significant advancements in computer vision models, their ability to generalize to novel object-attribute compositions remains limited. Existing methods for Compositional Zero-Shot Learning (CZSL) mainly focus on image classification. This paper aims to enhance CZSL in object detection without forgetting prior learned knowledge. We use Grounding DINO and incorporate Compositional Soft Prompting (CSP) into it and extend it with Compositional Anticipation. We achieve a 70.5% improvement over CSP on the harmonic mean (HM) between seen and unseen compositions on the CLEVR dataset. Furthermore, we introduce Contrastive Prompt Tuning to incrementally address model confusion between similar compositions. We demonstrate the effectiveness of this method and achieve an increase of 14.5% in HM across the pretrain, increment, and unseen sets. Collectively, these methods provide a framework for learning various compositions with limited data, as well as improving the performance of underperforming compositions when additional data becomes available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes