CVDec 13, 2018

When Semi-Supervised Learning Meets Transfer Learning: Training Strategies, Models and Datasets

Hong-Yu Zhou, Avital Oliver, Jianxin Wu, Yefeng Zheng

arXiv:1812.05313v114.125 citationsh-index: 67

Originality Synthesis-oriented

AI Analysis

This work provides empirical insights for practitioners using SSL with transfer learning, though it is incremental as it systematically tests existing methods under new conditions.

The paper investigates how semi-supervised learning (SSL) techniques perform when fine-tuning pre-trained models, finding that gains over fully-supervised baselines are smaller with pre-trained models compared to random initialization, but increase when source and target domains differ significantly, with some methods like Pseudo-Label still advancing baselines.

Semi-Supervised Learning (SSL) has been proved to be an effective way to leverage both labeled and unlabeled data at the same time. Recent semi-supervised approaches focus on deep neural networks and have achieved promising results on several benchmarks: CIFAR10, CIFAR100 and SVHN. However, most of their experiments are based on models trained from scratch instead of pre-trained models. On the other hand, transfer learning has demonstrated its value when the target domain has limited labeled data. Here comes the intuitive question: is it possible to incorporate SSL when fine-tuning a pre-trained model? We comprehensively study how SSL methods starting from pretrained models perform under varying conditions, including training strategies, architecture choice and datasets. From this study, we obtain several interesting and useful observations. While practitioners have had an intuitive understanding of these observations, we do a comprehensive emperical analysis and demonstrate that: (1) the gains from SSL techniques over a fully-supervised baseline are smaller when trained from a pre-trained model than when trained from random initialization, (2) when the domain of the source data used to train the pre-trained model differs significantly from the domain of the target task, the gains from SSL are significantly higher and (3) some SSL methods are able to advance fully-supervised baselines (like Pseudo-Label). We hope our studies can deepen the understanding of SSL research and facilitate the process of developing more effective SSL methods to utilize pre-trained models. Code is now available at github.

View on arXiv PDF

Similar