LGOct 17, 2025

On the Neural Feature Ansatz for Deep Neural Networks

Edward Tansley, Estelle Massart, Coralia Cartis

arXiv:2510.15563v19.41 citationsh-index: 32

Originality Incremental advance

AI Analysis

This provides incremental theoretical insights into feature learning for researchers in deep learning theory.

The paper tackles the problem of understanding feature learning in deep neural networks by extending the Neural Feature Ansatz (NFA) to multi-layer linear networks, proving it holds with exponent α=1/L, and showing it fails for some nonlinear architectures.

Understanding feature learning is an important open question in establishing a mathematical foundation for deep neural networks. The Neural Feature Ansatz (NFA) states that after training, the Gram matrix of the first-layer weights of a deep neural network is proportional to some power $α>0$ of the average gradient outer product (AGOP) of this network with respect to its inputs. Assuming gradient flow dynamics with balanced weight initialization, the NFA was proven to hold throughout training for two-layer linear networks with exponent $α= 1/2$ (Radhakrishnan et al., 2024). We extend this result to networks with $L \geq 2$ layers, showing that the NFA holds with exponent $α= 1/L$, thus demonstrating a depth dependency of the NFA. Furthermore, we prove that for unbalanced initialization, the NFA holds asymptotically through training if weight decay is applied. We also provide counterexamples showing that the NFA does not hold for some network architectures with nonlinear activations, even when these networks fit arbitrarily well the training data. We thoroughly validate our theoretical results through numerical experiments across a variety of optimization algorithms, weight decay rates and initialization schemes.

View on arXiv PDF

Similar