Learning the Relation between Similarity Loss and Clustering Loss in Self-Supervised Learning
This work addresses a bottleneck in self-supervised learning for computer vision by improving feature representation, though it appears incremental as it builds on existing contrastive learning methods.
The paper tackles the problem of limited instance-level information in self-supervised learning by leveraging similarity between distinct images and analyzing the relation between similarity loss and feature-level cross-entropy loss, achieving state-of-the-art results through a suitable combination of these losses.
Self-supervised learning enables networks to learn discriminative features from massive data itself. Most state-of-the-art methods maximize the similarity between two augmentations of one image based on contrastive learning. By utilizing the consistency of two augmentations, the burden of manual annotations can be freed. Contrastive learning exploits instance-level information to learn robust features. However, the learned information is probably confined to different views of the same instance. In this paper, we attempt to leverage the similarity between two distinct images to boost representation in self-supervised learning. In contrast to instance-level information, the similarity between two distinct images may provide more useful information. Besides, we analyze the relation between similarity loss and feature-level cross-entropy loss. These two losses are essential for most deep learning methods. However, the relation between these two losses is not clear. Similarity loss helps obtain instance-level representation, while feature-level cross-entropy loss helps mine the similarity between two distinct images. We provide theoretical analyses and experiments to show that a suitable combination of these two losses can get state-of-the-art results. Code is available at https://github.com/guijiejie/ICCL.