CV AIMar 2, 2021

Self-supervised Pretraining of Visual Features in the Wild

Priya Goyal, Mathilde Caron, Benjamin Lefaudeux, Min Xu, Pengchao Wang, Vivek Pai, Mannat Singh, Vitaliy Liptchinsky, Ishan Misra, Armand Joulin, Piotr Bojanowski

arXiv:2103.01988v236.4302 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of scaling self-supervised learning to large, uncurated datasets for computer vision researchers, confirming its practical viability beyond controlled environments.

The paper tackled the problem of whether self-supervised learning can work effectively on random, uncurated images in real-world settings, and the result was that their SEER model achieved 84.2% top-1 accuracy, surpassing the best self-supervised model by 1% and showing strong few-shot learning performance with 77.9% top-1 accuracy using only 10% of ImageNet data.

Recently, self-supervised learning methods like MoCo, SimCLR, BYOL and SwAV have reduced the gap with supervised methods. These results have been achieved in a control environment, that is the highly curated ImageNet dataset. However, the premise of self-supervised learning is that it can learn from any random image and from any unbounded dataset. In this work, we explore if self-supervision lives to its expectation by training large models on random, uncurated images with no supervision. Our final SElf-supERvised (SEER) model, a RegNetY with 1.3B parameters trained on 1B random images with 512 GPUs achieves 84.2% top-1 accuracy, surpassing the best self-supervised pretrained model by 1% and confirming that self-supervised learning works in a real world setting. Interestingly, we also observe that self-supervised models are good few-shot learners achieving 77.9% top-1 with access to only 10% of ImageNet. Code: https://github.com/facebookresearch/vissl

View on arXiv PDF Code

Similar