CVAug 16, 2024

TsCA: On the Semantic Consistency Alignment via Conditional Transport for Compositional Zero-Shot Learning

arXiv:2408.08703v38 citationsh-index: 19
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in CZSL for recognizing novel state-object compositions, representing an incremental advancement.

The paper tackles the challenge of aligning semantically similar multimodal representations and generalizing pre-trained knowledge to novel compositions in Compositional Zero-Shot Learning, proposing a Trisets Consistency Alignment framework that improves accuracy and speeds up inference.

Compositional Zero-Shot Learning (CZSL) aims to recognize novel state-object compositions by leveraging the shared knowledge of their primitive components. Despite considerable progress, effectively calibrating the bias between semantically similar multimodal representations, as well as generalizing pre-trained knowledge to novel compositional contexts, remains an enduring challenge. In this paper, our interest is to revisit the conditional transport (CT) theory and its homology to the visual-semantics interaction in CZSL and further, propose a novel Trisets Consistency Alignment framework (dubbed TsCA) that well-addresses these issues. Concretely, we utilize three distinct yet semantically homologous sets, i.e., patches, primitives, and compositions, to construct pairwise CT costs to minimize their semantic discrepancies. To further ensure the consistency transfer within these sets, we implement a cycle-consistency constraint that refines the learning by guaranteeing the feature consistency of the self-mapping during transport flow, regardless of modality. Moreover, we extend the CT plans to an open-world setting, which enables the model to effectively filter out unfeasible pairs, thereby speeding up the inference as well as increasing the accuracy. Extensive experiments are conducted to verify the effectiveness of the proposed method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes