CVLGNov 20, 2020

Exploring Simple Siamese Representation Learning

arXiv:2011.10566v14996 citations
AI Analysis

This work simplifies unsupervised representation learning for computer vision researchers by identifying a minimal set of components required for effective Siamese networks.

This paper explores Siamese networks for unsupervised visual representation learning, demonstrating that meaningful representations can be learned without negative sample pairs, large batches, or momentum encoders. The key finding is the essential role of a stop-gradient operation in preventing collapsing solutions, with the proposed "SimSiam" method achieving competitive results on ImageNet and downstream tasks.

Siamese networks have become a common structure in various recent models for unsupervised visual representation learning. These models maximize the similarity between two augmentations of one image, subject to certain conditions for avoiding collapsing solutions. In this paper, we report surprising empirical results that simple Siamese networks can learn meaningful representations even using none of the following: (i) negative sample pairs, (ii) large batches, (iii) momentum encoders. Our experiments show that collapsing solutions do exist for the loss and structure, but a stop-gradient operation plays an essential role in preventing collapsing. We provide a hypothesis on the implication of stop-gradient, and further show proof-of-concept experiments verifying it. Our "SimSiam" method achieves competitive results on ImageNet and downstream tasks. We hope this simple baseline will motivate people to rethink the roles of Siamese architectures for unsupervised representation learning. Code will be made available.

Code Implementations26 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes