CVFeb 12, 2025

A Survey on Data Curation for Visual Contrastive Learning: Why Crafting Effective Positive and Negative Pairs Matters

arXiv:2502.08134v11 citationsh-index: 6
Originality Synthesis-oriented
AI Analysis

This survey addresses the problem of optimizing contrastive pre-training effectiveness for researchers and practitioners working on downstream tasks.

The authors tackled the problem of data curation for visual contrastive learning, highlighting the importance of crafting effective positive and negative pairs for representation quality and training efficiency. A well-curated set of pairs can lead to stronger representations and faster convergence.

Visual contrastive learning aims to learn representations by contrasting similar (positive) and dissimilar (negative) pairs of data samples. The design of these pairs significantly impacts representation quality, training efficiency, and computational cost. A well-curated set of pairs leads to stronger representations and faster convergence. As contrastive pre-training sees wider adoption for solving downstream tasks, data curation becomes essential for optimizing its effectiveness. In this survey, we attempt to create a taxonomy of existing techniques for positive and negative pair curation in contrastive learning, and describe them in detail.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes