MLLGFeb 5

Optimal scaling laws in learning hierarchical multi-index models

arXiv:2602.05846v13 citationsh-index: 21
Originality Highly original
AI Analysis

This work addresses the theoretical understanding of neural network training dynamics for researchers in machine learning theory, offering rigorous insights into scaling phenomena, though it is incremental in building on existing theory for hierarchical models.

The paper tackles the problem of understanding scaling laws for two-layer neural networks learning hierarchical multi-index models, deriving exact information-theoretic scaling laws for subspace recovery and prediction error that reveal sequential learning through phase transitions. It shows these optimal rates are achieved by a simple spectral estimator, providing a unified explanation of scaling laws, plateaus, and spectral structure in shallow networks.

In this work, we provide a sharp theory of scaling laws for two-layer neural networks trained on a class of hierarchical multi-index targets, in a genuinely representation-limited regime. We derive exact information-theoretic scaling laws for subspace recovery and prediction error, revealing how the hierarchical features of the target are sequentially learned through a cascade of phase transitions. We further show that these optimal rates are achieved by a simple, target-agnostic spectral estimator, which can be interpreted as the small learning-rate limit of gradient descent on the first-layer weights. Once an adapted representation is identified, the readout can be learned statistically optimally, using an efficient procedure. As a consequence, we provide a unified and rigorous explanation of scaling laws, plateau phenomena, and spectral structure in shallow neural networks trained on such hierarchical targets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes