CVLGNov 11, 2022

Masked Contrastive Representation Learning

arXiv:2211.06012v18 citationsh-index: 69
Originality Incremental advance
AI Analysis

This work addresses the need for more effective self-supervised pre-training methods in computer vision, though it appears incremental as it builds on existing techniques.

The paper tackles the problem of unsupervised visual representation learning by combining masked image modeling and contrastive learning into a unified framework called MACRL, achieving superior results on multiple vision benchmarks such as CIFAR-10, CIFAR-100, and Tiny-ImageNet.

Masked image modelling (e.g., Masked AutoEncoder) and contrastive learning (e.g., Momentum Contrast) have shown impressive performance on unsupervised visual representation learning. This work presents Masked Contrastive Representation Learning (MACRL) for self-supervised visual pre-training. In particular, MACRL leverages the effectiveness of both masked image modelling and contrastive learning. We adopt an asymmetric setting for the siamese network (i.e., encoder-decoder structure in both branches), where one branch with higher mask ratio and stronger data augmentation, while the other adopts weaker data corruptions. We optimize a contrastive learning objective based on the learned features from the encoder in both branches. Furthermore, we minimize the $L_1$ reconstruction loss according to the decoders' outputs. In our experiments, MACRL presents superior results on various vision benchmarks, including CIFAR-10, CIFAR-100, Tiny-ImageNet, and two other ImageNet subsets. Our framework provides unified insights on self-supervised visual pre-training and future research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes