How Useful is Self-Supervised Pretraining for Visual Tasks?
This work provides insights for practitioners on when and how to use self-supervised pretraining in visual tasks, though it is incremental as it builds on existing methods without introducing new ones.
The paper investigates factors affecting the utility of self-supervised pretraining for vision tasks by evaluating algorithms on synthetic datasets, finding that linear evaluation does not correlate with fine-tuning performance and analyzing how utility changes with label availability and data properties.
Recent advances have spurred incredible progress in self-supervised pretraining for vision. We investigate what factors may play a role in the utility of these pretraining methods for practitioners. To do this, we evaluate various self-supervised algorithms across a comprehensive array of synthetic datasets and downstream tasks. We prepare a suite of synthetic data that enables an endless supply of annotated images as well as full control over dataset difficulty. Our experiments offer insights into how the utility of self-supervision changes as the number of available labels grows as well as how the utility changes as a function of the downstream task and the properties of the training data. We also find that linear evaluation does not correlate with finetuning performance. Code and data is available at \href{https://www.github.com/princeton-vl/selfstudy}{github.com/princeton-vl/selfstudy}.