CVSep 19, 2025

PCSR: Pseudo-label Consistency-Guided Sample Refinement for Noisy Correspondence Learning

arXiv:2509.15623v1h-index: 7
Originality Incremental advance
AI Analysis

This addresses noisy data issues in cross-modal retrieval for applications like multimedia search, but it is incremental as it builds on existing noisy correspondence methods with refinements.

The paper tackles the problem of noisy correspondences in cross-modal retrieval, where misaligned image-text pairs degrade performance, by introducing the PCSR framework that refines noisy samples based on pseudo-label consistency, achieving improved retrieval robustness as validated on datasets like CC152K, MS-COCO, and Flickr30K.

Cross-modal retrieval aims to align different modalities via semantic similarity. However, existing methods often assume that image-text pairs are perfectly aligned, overlooking Noisy Correspondences in real data. These misaligned pairs misguide similarity learning and degrade retrieval performance. Previous methods often rely on coarse-grained categorizations that simply divide data into clean and noisy samples, overlooking the intrinsic diversity within noisy instances. Moreover, they typically apply uniform training strategies regardless of sample characteristics, resulting in suboptimal sample utilization for model optimization. To address the above challenges, we introduce a novel framework, called Pseudo-label Consistency-Guided Sample Refinement (PCSR), which enhances correspondence reliability by explicitly dividing samples based on pseudo-label consistency. Specifically, we first employ a confidence-based estimation to distinguish clean and noisy pairs, then refine the noisy pairs via pseudo-label consistency to uncover structurally distinct subsets. We further proposed a Pseudo-label Consistency Score (PCS) to quantify prediction stability, enabling the separation of ambiguous and refinable samples within noisy pairs. Accordingly, we adopt Adaptive Pair Optimization (APO), where ambiguous samples are optimized with robust loss functions and refinable ones are enhanced via text replacement during training. Extensive experiments on CC152K, MS-COCO and Flickr30K validate the effectiveness of our method in improving retrieval robustness under noisy supervision.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes