LGOct 16, 2025

A simple mean field model of feature learning

Niclas Göring, Chris Mingard, Yoonsoo Nam, Ard Louis

arXiv:2510.15174v14.1h-index: 6

Originality Incremental advance

AI Analysis

This work addresses the poorly understood mechanism of feature learning in neural networks, providing theoretical insights that are incremental but improve quantitative predictions for researchers in machine learning theory.

The authors tackled the problem of understanding feature learning in neural networks by developing a mean-field theory that predicts a symmetry breaking phase transition where networks align with target functions, but found it underestimated generalization improvements, which they corrected by incorporating self-reinforcing input feature selection to quantitatively match learning curves.

Feature learning (FL), where neural networks adapt their internal representations during training, remains poorly understood. Using methods from statistical physics, we derive a tractable, self-consistent mean-field (MF) theory for the Bayesian posterior of two-layer non-linear networks trained with stochastic gradient Langevin dynamics (SGLD). At infinite width, this theory reduces to kernel ridge regression, but at finite width it predicts a symmetry breaking phase transition where networks abruptly align with target functions. While the basic MF theory provides theoretical insight into the emergence of FL in the finite-width regime, semi-quantitatively predicting the onset of FL with noise or sample size, it substantially underestimates the improvements in generalisation after the transition. We trace this discrepancy to a key mechanism absent from the plain MF description: \textit{self-reinforcing input feature selection}. Incorporating this mechanism into the MF theory allows us to quantitatively match the learning curves of SGLD-trained networks and provides mechanistic insight into FL.

View on arXiv PDF

Similar