Complementary Benefits of Contrastive Learning and Self-Training Under Distribution Shift
This addresses the problem of improving model robustness under distribution shift for machine learning practitioners, offering incremental insights into combining existing techniques.
The paper investigates combining self-training and contrastive learning under distribution shift, finding that in domain adaptation settings, they provide complementary gains of 3-8% higher accuracy across eight datasets, while in semi-supervised learning, the benefits are not synergistic.
Self-training and contrastive learning have emerged as leading techniques for incorporating unlabeled data, both under distribution shift (unsupervised domain adaptation) and when it is absent (semi-supervised learning). However, despite the popularity and compatibility of these techniques, their efficacy in combination remains unexplored. In this paper, we undertake a systematic empirical investigation of this combination, finding that (i) in domain adaptation settings, self-training and contrastive learning offer significant complementary gains; and (ii) in semi-supervised learning settings, surprisingly, the benefits are not synergistic. Across eight distribution shift datasets (e.g., BREEDs, WILDS), we demonstrate that the combined method obtains 3--8% higher accuracy than either approach independently. We then theoretically analyze these techniques in a simplified model of distribution shift, demonstrating scenarios under which the features produced by contrastive learning can yield a good initialization for self-training to further amplify gains and achieve optimal performance, even when either method alone would fail.