CVLGJul 17, 2025

Differential-informed Sample Selection Accelerates Multimodal Contrastive Learning

arXiv:2507.12998v15 citationsh-index: 21Has Code
Originality Incremental advance
AI Analysis

This work addresses the efficiency and noise issues in multimodal contrastive learning, offering a practical solution for researchers and practitioners dealing with large-scale data, though it is incremental in improving sample selection techniques.

The paper tackles the problem of noisy correspondence in multimodal contrastive learning by proposing a differential-informed sample selection method (DISSect), which accelerates training and achieves consistent superiority over state-of-the-art methods on three benchmark datasets.

The remarkable success of contrastive-learning-based multimodal models has been greatly driven by training on ever-larger datasets with expensive compute consumption. Sample selection as an alternative efficient paradigm plays an important direction to accelerate the training process. However, recent advances on sample selection either mostly rely on an oracle model to offline select a high-quality coreset, which is limited in the cold-start scenarios, or focus on online selection based on real-time model predictions, which has not sufficiently or efficiently considered the noisy correspondence. To address this dilemma, we propose a novel Differential-Informed Sample Selection (DISSect) method, which accurately and efficiently discriminates the noisy correspondence for training acceleration. Specifically, we rethink the impact of noisy correspondence on contrastive learning and propose that the differential between the predicted correlation of the current model and that of a historical model is more informative to characterize sample quality. Based on this, we construct a robust differential-based sample selection and analyze its theoretical insights. Extensive experiments on three benchmark datasets and various downstream tasks demonstrate the consistent superiority of DISSect over current state-of-the-art methods. Source code is available at: https://github.com/MediaBrain-SJTU/DISSect.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes