LGMLOct 28, 2025

From Linear to Nonlinear: Provable Weak-to-Strong Generalization through Feature Learning

arXiv:2510.24812v15 citationsh-index: 3
Originality Incremental advance
AI Analysis

This work addresses a theoretical gap in understanding weak-to-strong generalization for structured data, offering incremental insights into specific neural network architectures.

The paper tackles the problem of weak-to-strong generalization by providing a formal analysis from a linear CNN to a two-layer ReLU CNN, revealing distinct mechanisms in data-scarce and data-abundant regimes, such as characterizing a transition boundary for overfitting and identifying early-phase label correction with potential degradation from overtraining.

Weak-to-strong generalization refers to the phenomenon where a stronger model trained under supervision from a weaker one can outperform its teacher. While prior studies aim to explain this effect, most theoretical insights are limited to abstract frameworks or linear/random feature models. In this paper, we provide a formal analysis of weak-to-strong generalization from a linear CNN (weak) to a two-layer ReLU CNN (strong). We consider structured data composed of label-dependent signals of varying difficulty and label-independent noise, and analyze gradient descent dynamics when the strong model is trained on data labeled by the pretrained weak model. Our analysis identifies two regimes -- data-scarce and data-abundant -- based on the signal-to-noise characteristics of the dataset, and reveals distinct mechanisms of weak-to-strong generalization. In the data-scarce regime, generalization occurs via benign overfitting or fails via harmful overfitting, depending on the amount of data, and we characterize the transition boundary. In the data-abundant regime, generalization emerges in the early phase through label correction, but we observe that overtraining can subsequently degrade performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes