CVApr 20

Dynamic Visual-semantic Alignment for Zero-shot Learning with Ambiguous Labels

arXiv:2604.1771018.5h-index: 4
AI Analysis

For zero-shot learning practitioners, DVSA addresses the practical problem of label noise, offering a robust framework that outperforms existing methods under ambiguous supervision.

Zero-shot learning typically assumes clean labels, but real-world label noise degrades performance. DVSA introduces a dynamic label disambiguation mechanism and bidirectional visual-semantic alignment with contrastive optimization, achieving state-of-the-art results under ambiguous labels on standard benchmarks.

Zero-shot learning (ZSL) aims to recognize unseen classes without visual instances. However, existing methods usually assume clean labels, overlooking real-world label noise and ambiguity, which degrades performance. To bridge this gap, we propose the Dynamic Visual-semantic Alignment (DVSA), a robust ZSL framework for learning from ambiguous labels. DVSA uses a bidirectional visual-semantic alignment module with attention to mutually calibrate visual features and attribute prototypes, and a contrastive optimization grounded in Mutual Information (MI) at the attribute level to strengthen discriminative, semantically consistent attributes. In addition, a dynamic label disambiguation mechanism iteratively corrects noisy supervision while preserving semantic consistency, narrowing the instance-label gap, and improving generalization. Extensive experiments on standard benchmarks verify that DVSA achieves stronger performance under ambiguous supervision.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes