Improving self-training under distribution shifts via anchored confidence with theoretical guarantees
This work addresses a critical issue in semi-supervised learning for machine learning practitioners, offering a computationally efficient solution to enhance robustness under distribution shifts, though it is incremental in nature.
The paper tackles the problem of self-training under distribution shifts, where prediction confidence and accuracy become misaligned, and proposes a method using temporal consistency to improve performance by 8% to 16% across various scenarios without added computational cost.
Self-training often falls short under distribution shifts due to an increased discrepancy between prediction confidence and actual accuracy. This typically necessitates computationally demanding methods such as neighborhood or ensemble-based label corrections. Drawing inspiration from insights on early learning regularization, we develop a principled method to improve self-training under distribution shifts based on temporal consistency. Specifically, we build an uncertainty-aware temporal ensemble with a simple relative thresholding. Then, this ensemble smooths noisy pseudo labels to promote selective temporal consistency. We show that our temporal ensemble is asymptotically correct and our label smoothing technique can reduce the optimality gap of self-training. Our extensive experiments validate that our approach consistently improves self-training performances by 8% to 16% across diverse distribution shift scenarios without a computational overhead. Besides, our method exhibits attractive properties, such as improved calibration performance and robustness to different hyperparameter choices.