CV AI LGNov 29, 2021

Similarity Contrastive Estimation for Self-Supervised Soft Contrastive Learning

Julien Denize, Jaonary Rabarisoa, Astrid Orcesi, Romain Hérault, Stéphane Canu

arXiv:2111.14585v210.638 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the issue of degraded relation quality in self-supervised representation learning for computer vision, offering an incremental improvement over existing contrastive methods.

The paper tackles the problem of contrastive learning's negative treatment of semantically similar instances by proposing Similarity Contrastive Estimation (SCE), a soft contrastive learning method that uses semantic similarity distributions, achieving 72.1% Top-1 accuracy on ImageNet linear evaluation at 100 epochs and 75.4% at 200 epochs.

Contrastive representation learning has proven to be an effective self-supervised learning method. Most successful approaches are based on Noise Contrastive Estimation (NCE) and use different views of an instance as positives that should be contrasted with other instances, called negatives, that are considered as noise. However, several instances in a dataset are drawn from the same distribution and share underlying semantic information. A good data representation should contain relations, or semantic similarity, between the instances. Contrastive learning implicitly learns relations but considering all negatives as noise harms the quality of the learned relations. To circumvent this issue, we propose a novel formulation of contrastive learning using semantic similarity between instances called Similarity Contrastive Estimation (SCE). Our training objective is a soft contrastive learning one. Instead of hard classifying positives and negatives, we estimate from one view of a batch a continuous distribution to push or pull instances based on their semantic similarities. This target similarity distribution is sharpened to eliminate noisy relations. The model predicts for each instance, from another view, the target distribution while contrasting its positive with negatives. Experimental results show that SCE is Top-1 on the ImageNet linear evaluation protocol at 100 pretraining epochs with 72.1% accuracy and is competitive with state-of-the-art algorithms by reaching 75.4% for 200 epochs with multi-crop. We also show that SCE is able to generalize to several tasks. Source code is available here: https://github.com/CEA-LIST/SCE.

View on arXiv PDF Code

Similar