LGMLJun 11, 2025

On the Similarities of Embeddings in Contrastive Learning

arXiv:2506.09781v22 citationsh-index: 15ICML
Originality Incremental advance
AI Analysis

This work addresses theoretical and practical bottlenecks in contrastive learning for machine learning researchers, offering incremental improvements to enhance representation quality in resource-constrained settings.

The paper tackles the problem of understanding and improving contrastive learning by analyzing cosine similarities, showing that perfect alignment of positive pairs is unattainable under certain conditions and that smaller batch sizes degrade representation quality due to higher variance in negative-pair similarities, with an auxiliary loss proposed to reduce this variance and improve performance in small-batch settings.

Contrastive learning operates on a simple yet effective principle: Embeddings of positive pairs are pulled together, while those of negative pairs are pushed apart. In this paper, we propose a unified framework for understanding contrastive learning through the lens of cosine similarity, and present two key theoretical insights derived from this framework. First, in full-batch settings, we show that perfect alignment of positive pairs is unattainable when negative-pair similarities fall below a threshold, and this misalignment can be mitigated by incorporating within-view negative pairs into the objective. Second, in mini-batch settings, smaller batch sizes induce stronger separation among negative pairs in the embedding space, i.e., higher variance in their similarities, which in turn degrades the quality of learned representations compared to full-batch settings. To address this, we propose an auxiliary loss that reduces the variance of negative-pair similarities in mini-batch settings. Empirical results show that incorporating the proposed loss improves performance in small-batch settings.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes