Representation Learning via Consistent Assignment of Views to Clusters
This addresses the problem of learning effective visual representations without labels for computer vision applications, representing an incremental improvement over existing contrastive and clustering methods.
The paper tackles unsupervised visual representation learning by introducing CARL, which combines contrastive learning with deep clustering to enforce consistent prototype assignments for different image views, achieving state-of-the-art results on benchmarks like linear evaluation, semi-supervised learning, and transfer learning.
We introduce Consistent Assignment for Representation Learning (CARL), an unsupervised learning method to learn visual representations by combining ideas from self-supervised contrastive learning and deep clustering. By viewing contrastive learning from a clustering perspective, CARL learns unsupervised representations by learning a set of general prototypes that serve as energy anchors to enforce different views of a given image to be assigned to the same prototype. Unlike contemporary work on contrastive learning with deep clustering, CARL proposes to learn the set of general prototypes in an online fashion, using gradient descent without the necessity of using non-differentiable algorithms or K-Means to solve the cluster assignment problem. CARL surpasses its competitors in many representations learning benchmarks, including linear evaluation, semi-supervised learning, and transfer learning.