CVDec 10, 2024

Compositional Zero-Shot Learning with Contextualized Cues and Adaptive Contrastive Training

arXiv:2412.07161v11 citationsh-index: 13MM
Originality Highly original
AI Analysis

This work addresses the challenge of compositional zero-shot learning for AI systems, representing an incremental improvement over existing CLIP-based methods.

The paper tackles the problem of recognizing unseen attribute-object combinations in compositional zero-shot learning by introducing a novel framework with two modules for improved understanding and linking, achieving state-of-the-art performance on three benchmark datasets in both closed-world and open-world scenarios.

Compositional Zero-Shot Learning (CZSL) aims to recognize unseen combinations of seen attributes and objects. Current CLIP-based methods in CZSL, despite their advancements, often fail to effectively understand and link the attributes and objects due to inherent limitations in CLIP's pretraining mechanisms. To address these shortcomings, this paper introduces a novel framework, Understanding and Linking Attributes and Objects (ULAO) in CZSL, which comprises two innovative modules. The Understanding Attributes and Objects (UAO) module improves primitive understanding by sequential primitive prediction and leveraging recognized objects as contextual hints for attribute classification. Concurrently, the Linking Attributes and Objects (LAO) module improves the attribute-object linkage understanding through a new contrastive learning strategy that incorporates tailored hard negative generation and adaptive loss adjustments. We demonstrate our model's superiority by showcasing its state-of-the-art performance across three benchmark datasets in both Closed-World (CW) and Open-World (OW) scenarios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes