Adversarial Training Helps Transfer Learning via Better Representations
This provides theoretical insights for machine learning practitioners using transfer learning, though it is incremental as it builds on existing empirical observations.
The paper tackles the problem of understanding why adversarial training improves transfer learning, showing theoretically that it generates better representations, leading to more accurate predictors on target data, with empirical support on datasets and architectures.
Transfer learning aims to leverage models pre-trained on source data to efficiently adapt to target setting, where only limited data are available for model fine-tuning. Recent works empirically demonstrate that adversarial training in the source data can improve the ability of models to transfer to new domains. However, why this happens is not known. In this paper, we provide a theoretical model to rigorously analyze how adversarial training helps transfer learning. We show that adversarial training in the source data generates provably better representations, so fine-tuning on top of this representation leads to a more accurate predictor of the target data. We further demonstrate both theoretically and empirically that semi-supervised learning in the source data can also improve transfer learning by similarly improving the representation. Moreover, performing adversarial training on top of semi-supervised learning can further improve transferability, suggesting that the two approaches have complementary benefits on representations. We support our theories with experiments on popular data sets and deep learning architectures.