Tradeoffs Between Contrastive and Supervised Learning: An Empirical Study
This work highlights tradeoffs in pretraining methods for computer vision practitioners, particularly those with limited compute, and is incremental in refining existing practices.
The study investigates when supervised pretraining outperforms contrastive learning, finding that supervised learning is better with small pretraining budgets on eight image classification datasets and more resilient to certain corruptions and biases with larger budgets.
Contrastive learning has made considerable progress in computer vision, outperforming supervised pretraining on a range of downstream datasets. However, is contrastive learning the better choice in all situations? We demonstrate two cases where it is not. First, under sufficiently small pretraining budgets, supervised pretraining on ImageNet consistently outperforms a comparable contrastive model on eight diverse image classification datasets. This suggests that the common practice of comparing pretraining approaches at hundreds or thousands of epochs may not produce actionable insights for those with more limited compute budgets. Second, even with larger pretraining budgets we identify tasks where supervised learning prevails, perhaps because the object-centric bias of supervised pretraining makes the model more resilient to common corruptions and spurious foreground-background correlations. These results underscore the need to characterize tradeoffs of different pretraining objectives across a wider range of contexts and training regimes.