CVDec 1, 2021

Revisiting the Transferability of Supervised Pretraining: an MLP Perspective

arXiv:2112.00496v361 citations
Originality Incremental advance
AI Analysis

This work addresses a problem in computer vision for researchers and practitioners by making supervised pretraining more competitive with unsupervised methods, though it is incremental as it builds on existing pretraining paradigms.

The paper tackles the transferability gap between unsupervised and supervised pretraining in visual learning by identifying the MLP projector as a key factor, and shows that adding an MLP projector to supervised pretraining boosts performance, achieving gains like +7.2% top-1 accuracy on concept generalization and +5.8% on linear evaluation tasks.

The pretrain-finetune paradigm is a classical pipeline in visual learning. Recent progress on unsupervised pretraining methods shows superior transfer performance to their supervised counterparts. This paper revisits this phenomenon and sheds new light on understanding the transferability gap between unsupervised and supervised pretraining from a multilayer perceptron (MLP) perspective. While previous works focus on the effectiveness of MLP on unsupervised image classification where pretraining and evaluation are conducted on the same dataset, we reveal that the MLP projector is also the key factor to better transferability of unsupervised pretraining methods than supervised pretraining methods. Based on this observation, we attempt to close the transferability gap between supervised and unsupervised pretraining by adding an MLP projector before the classifier in supervised pretraining. Our analysis indicates that the MLP projector can help retain intra-class variation of visual features, decrease the feature distribution distance between pretraining and evaluation datasets, and reduce feature redundancy. Extensive experiments on public benchmarks demonstrate that the added MLP projector significantly boosts the transferability of supervised pretraining, e.g. +7.2% top-1 accuracy on the concept generalization task, +5.8% top-1 accuracy for linear evaluation on 12-domain classification tasks, and +0.8% AP on COCO object detection task, making supervised pretraining comparable or even better than unsupervised pretraining.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes