CVAIMar 17

FALCON: False-Negative Aware Learning of Contrastive Negatives in Vision-Language Alignment

arXiv:2505.1119244.72 citationsh-index: 4
Predicted impact top 74% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This addresses a critical challenge in vision-language alignment for AI researchers, though it appears incremental as an enhancement to existing frameworks.

The paper tackles the problem of false negatives degrading vision-language pretraining by proposing FALCON, a learning-based mini-batch construction strategy that adaptively balances hard and false negatives. Results show it significantly improves performance across three frameworks and various downstream tasks.

False negatives pose a critical challenge in vision-language pretraining (VLP) due to the many-to-many correspondence between images and texts in large-scale datasets. These false negatives introduce conflicting supervision signals that degrade the learned embedding space and diminish the effectiveness of hard negative sampling. In this paper, we propose FALCON (False-negative Aware Learning of COntrastive Negatives), a learning-based mini-batch construction strategy that adaptively balances the trade-off between hard and false negatives during VLP. Rather than relying on fixed heuristics, FALCON employs a negative mining scheduler that dynamically selects negative samples of appropriate hardness for each anchor instance during mini-batch construction, guided by a proxy for cross-modal alignment improvement. Experimental results demonstrate that FALCON significantly improves performance across three vision-language learning frameworks (ALBEF, BLIP-2, SigLIP-2) and a broad range of downstream tasks and evaluation settings, underscoring its effectiveness and robustness in mitigating the impact of false negatives.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes