Does Weak-to-strong Generalization Happen under Spurious Correlations?
This addresses a key problem in weak-to-strong generalization for machine learning practitioners, particularly in scenarios with spurious correlations, offering theoretical insights and an effective algorithmic solution.
The paper investigates whether weak-to-strong generalization occurs when fine-tuning a strong pre-trained model with pseudolabels from a weaker teacher on tasks with spurious correlations, finding that it always happens with sufficient pseudolabels when group imbalances match but may fail when they differ, with gains diminishing as the square of the imbalance difference. It proposes a simple algorithm that retrains the student on high-confidence data, achieving consistent improvements over vanilla fine-tuning.
We initiate a unified theoretical and algorithmic study of a key problem in weak-to-strong (W2S) generalization: when fine-tuning a strong pre-trained student with pseudolabels from a weaker teacher on a downstream task with spurious correlations, does W2S happen, and how to improve it upon failures? We consider two sources of spurious correlations caused by group imbalance: (i) a weak teacher fine-tuned on group-imbalanced labeled data with a minority group of fraction $η_\ell$, and (ii) a group-imbalanced unlabeled set pseudolabeled by the teacher with a minority group of fraction $η_u$. Theoretically, a precise characterization of W2S gain at the proportional asymptotic limit shows that W2S always happens with sufficient pseudolabels when $η_u = η_\ell$ but may fail when $η_u \ne η_\ell$, where W2S gain diminishes as $(η_u - η_\ell)^2$ increases. Our theory is corroborated by extensive experiments on various spurious correlation benchmarks and teacher-student pairs. To boost W2S performance upon failures, we further propose a simple, effective algorithmic remedy that retrains the strong student on its high-confidence data subset after W2S fine-tuning. Our algorithm is group-label-free and achieves consistent, substantial improvements over vanilla W2S fine-tuning.