Scaling Laws and Spectra of Shallow Neural Networks in the Feature Learning Regime
This work addresses a foundational problem in deep learning theory by offering a systematic analysis of scaling laws in the feature learning regime, which is incremental as it extends beyond linear models to specific network types.
The paper tackles the theoretical understanding of neural scaling laws in shallow neural networks by analyzing scaling exponents for excess risk and linking them to spectral properties of weights, providing a theoretical validation of empirical observations on power-law tails and generalization performance.
Neural scaling laws underlie many of the recent advances in deep learning, yet their theoretical understanding remains largely confined to linear models. In this work, we present a systematic analysis of scaling laws for quadratic and diagonal neural networks in the feature learning regime. Leveraging connections with matrix compressed sensing and LASSO, we derive a detailed phase diagram for the scaling exponents of the excess risk as a function of sample complexity and weight decay. This analysis uncovers crossovers between distinct scaling regimes and plateau behaviors, mirroring phenomena widely reported in the empirical neural scaling literature. Furthermore, we establish a precise link between these regimes and the spectral properties of the trained network weights, which we characterize in detail. As a consequence, we provide a theoretical validation of recent empirical observations connecting the emergence of power-law tails in the weight spectrum with network generalization performance, yielding an interpretation from first principles.