LGMLJul 21, 2021

On the Memorization Properties of Contrastive Learning

arXiv:2107.10143v13 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the understanding of how contrastive learning methods like SimCLR memorize data, which is relevant for researchers in machine learning aiming to improve training approaches, though it appears incremental as it builds on existing memorization studies.

The paper investigates the memorization properties of SimCLR, a contrastive self-supervised learning method, comparing it to supervised learning and random labels training, finding that SimCLR's training objects and augmentations vary in complexity and that its complexity distribution resembles random labels training.

Memorization studies of deep neural networks (DNNs) help to understand what patterns and how do DNNs learn, and motivate improvements to DNN training approaches. In this work, we investigate the memorization properties of SimCLR, a widely used contrastive self-supervised learning approach, and compare them to the memorization of supervised learning and random labels training. We find that both training objects and augmentations may have different complexity in the sense of how SimCLR learns them. Moreover, we show that SimCLR is similar to random labels training in terms of the distribution of training objects complexity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes