Sharp description of local minima in the loss landscape of high-dimensional two-layer ReLU neural networks

arXiv:2604.0941250.2
Predicted impact top 28% in ML · last 90 daysOriginality Highly original
AI Analysis

This provides foundational insights into optimization challenges in neural networks, addressing a core problem for machine learning researchers.

The paper tackles the problem of characterizing local minima in the loss landscape of high-dimensional two-layer ReLU neural networks in a teacher-student setting with Gaussian covariates, showing that local minima have an exact low-dimensional representation and are linked to attractive fixed points of SGD dynamics, with global minima becoming more accessible in overparameterized regimes.

We study the population loss landscape of two-layer ReLU networks of the form $\sum_{k=1}^K \mathrm{ReLU}(w_k^\top x)$ in a realisable teacher-student setting with Gaussian covariates. We show that local minima admit an exact low-dimensional representation in terms of summary statistics, yielding a sharp and interpretable characterisation of the landscape. We further establish a direct link with one-pass SGD: local minima correspond to attractive fixed points of the dynamics in summary statistics space. This perspective reveals a hierarchical structure of minima: they are typically isolated in the well-specified regime, but become connected by flat directions as network width increases. In this overparameterised regime, global minima become increasingly accessible, attracting the dynamics and reducing convergence to spurious solutions. Overall, our results reveal intrinsic limitations of common simplifying assumptions, which may miss essential features of the loss landscape even in minimal neural network models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes