LGMLMar 4, 2025

Weak-to-Strong Generalization Even in Random Feature Networks, Provably

Tsinghua
arXiv:2503.02877v312 citationsh-index: 27ICML
Originality Incremental advance
AI Analysis

This provides theoretical insights into generalization phenomena in machine learning, but is incremental as it builds on prior work in a simplified model.

The paper tackles the problem of weak-to-strong generalization, showing that a strong student model can outperform a weak teacher even when trained only on teacher-generated labels, using random feature networks with early stopping, and demonstrates quantitative limits with provable results.

Weak-to-Strong Generalization (Burns et al., 2024) is the phenomenon whereby a strong student, say GPT-4, learns a task from a weak teacher, say GPT-2, and ends up significantly outperforming the teacher. We show that this phenomenon does not require a strong learner like GPT-4. We consider student and teacher that are random feature models, described by two-layer networks with a random and fixed bottom layer and a trained top layer. A "weak" teacher, with a small number of units (i.e. random features), is trained on the population, and a "strong" student, with a much larger number of units (i.e. random features), is trained only on labels generated by the weak teacher. We demonstrate, prove, and understand how the student can outperform the teacher, even though trained only on data labeled by the teacher. We also explain how such weak-to-strong generalization is enabled by early stopping. Importantly, we also show the quantitative limits of weak-to-strong generalization in this model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes