CVApr 20

Dynamic Visual-semantic Alignment for Zero-shot Learning with Ambiguous Labels

Jiangnan Li, Linqing Huang, Xiaowen Yan, Min Gan, Wenpeng Lu, Jinfu Fan

arXiv:2604.1771018.5h-index: 4

AI Analysis

For zero-shot learning practitioners, DVSA addresses the practical problem of label noise, offering a robust framework that outperforms existing methods under ambiguous supervision.

Zero-shot learning typically assumes clean labels, but real-world label noise degrades performance. DVSA introduces a dynamic label disambiguation mechanism and bidirectional visual-semantic alignment with contrastive optimization, achieving state-of-the-art results under ambiguous labels on standard benchmarks.

Zero-shot learning (ZSL) aims to recognize unseen classes without visual instances. However, existing methods usually assume clean labels, overlooking real-world label noise and ambiguity, which degrades performance. To bridge this gap, we propose the Dynamic Visual-semantic Alignment (DVSA), a robust ZSL framework for learning from ambiguous labels. DVSA uses a bidirectional visual-semantic alignment module with attention to mutually calibrate visual features and attribute prototypes, and a contrastive optimization grounded in Mutual Information (MI) at the attribute level to strengthen discriminative, semantically consistent attributes. In addition, a dynamic label disambiguation mechanism iteratively corrects noisy supervision while preserving semantic consistency, narrowing the instance-label gap, and improving generalization. Extensive experiments on standard benchmarks verify that DVSA achieves stronger performance under ambiguous supervision.

View on arXiv PDF

Similar