CVApr 30, 2019

A critical analysis of self-supervision, or what we can learn from a single image

arXiv:1904.13132v3152 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of understanding the limitations and capabilities of self-supervised learning for computer vision researchers, revealing that low-level statistics can be efficiently learned without large datasets, though it is incremental in highlighting gaps for deeper layers.

The paper critically analyzes self-supervision techniques for deep convolutional neural networks, showing that methods like BiGAN, RotNet, and DeepCluster can learn early layers from a single image with strong data augmentation as effectively as using millions of labeled images, but deeper layers still lag behind manual supervision.

We look critically at popular self-supervision techniques for learning deep convolutional neural networks without manual labels. We show that three different and representative methods, BiGAN, RotNet and DeepCluster, can learn the first few layers of a convolutional network from a single image as well as using millions of images and manual labels, provided that strong data augmentation is used. However, for deeper layers the gap with manual supervision cannot be closed even if millions of unlabelled images are used for training. We conclude that: (1) the weights of the early layers of deep networks contain limited information about the statistics of natural images, that (2) such low-level statistics can be learned through self-supervision just as well as through strong supervision, and that (3) the low-level statistics can be captured via synthetic transformations instead of using a large image dataset.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes