MixCo: Mix-up Contrastive Learning for Visual Representation
This work addresses the challenge of enhancing self-supervised learning efficiency for visual tasks, particularly in resource-constrained scenarios, though it is incremental as it builds on existing contrastive learning methods.
The paper tackles the problem of improving self-supervised visual representation learning by introducing MixCo, which uses mix-up of images to create semi-positive pairs for contrastive learning, resulting in consistent test accuracy improvements on datasets like TinyImageNet, CIFAR10, and CIFAR100, with more significant gains when model capacity is limited.
Contrastive learning has shown remarkable results in recent self-supervised approaches for visual representation. By learning to contrast positive pairs' representation from the corresponding negatives pairs, one can train good visual representations without human annotations. This paper proposes Mix-up Contrast (MixCo), which extends the contrastive learning concept to semi-positives encoded from the mix-up of positive and negative images. MixCo aims to learn the relative similarity of representations, reflecting how much the mixed images have the original positives. We validate the efficacy of MixCo when applied to the recent self-supervised learning algorithms under the standard linear evaluation protocol on TinyImageNet, CIFAR10, and CIFAR100. In the experiments, MixCo consistently improves test accuracy. Remarkably, the improvement is more significant when the learning capacity (e.g., model size) is limited, suggesting that MixCo might be more useful in real-world scenarios. The code is available at: https://github.com/Lee-Gihun/MixCo-Mixup-Contrast.