CVJan 18, 2024

CPCL: Cross-Modal Prototypical Contrastive Learning for Weakly Supervised Text-based Person Retrieval

arXiv:2401.10011v33 citations
Originality Incremental advance
AI Analysis

This work addresses the practical problem of retrieving person images using text descriptions without identity annotations, offering a novel method that is incremental but shows strong gains in a domain-specific application.

The paper tackles weakly supervised text-based person retrieval by addressing intra-class differences and cross-modal semantic gaps, proposing CPCL which achieves state-of-the-art results on benchmarks like CUHK-PEDES and ICFG-PEDES with mAP improvements of 2.1% and 3.5%, respectively.

Weakly supervised text-based person retrieval seeks to retrieve images of a target person using textual descriptions, without relying on identity annotations and is more challenging and practical. The primary challenge is the intra-class differences, encompassing intra-modal feature variations and cross-modal semantic gaps. Prior works have focused on instance-level samples and ignored prototypical features of each person which are intrinsic and invariant. Toward this, we propose a Cross-Modal Prototypical Contrastive Learning (CPCL) method. In practice, the CPCL introduces the CLIP model to weakly supervised text-based person retrieval to map visual and textual instances into a shared latent space. Subsequently, the proposed Prototypical Multi-modal Memory (PMM) module captures associations between heterogeneous modalities of image-text pairs belonging to the same person through the Hybrid Cross-modal Matching (HCM) module in a many-to-many mapping fashion. Moreover, the Outlier Pseudo Label Mining (OPLM) module further distinguishes valuable outlier samples from each modality, enhancing the creation of more reliable clusters by mining implicit relationships between image-text pairs. We conduct extensive experiments on popular benchmarks of weakly supervised text-based person retrieval, which validate the effectiveness, generalizability of CPCL.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes