Phase Transitions between Accuracy Regimes in L2 regularized Deep Neural Networks

arXiv:2505.06597v2h-index: 1
Originality Incremental advance
AI Analysis

This work addresses the problem of understanding phase transitions and error landscape dynamics in deep neural networks for researchers in machine learning theory, though it appears incremental as it builds on existing concepts like grokking and regularization effects.

The paper explains that increasing L2 regularization in deep neural networks triggers a first-order phase transition into an under-parametrized phase, linked to the scalar curvature of the error landscape, and predicts new transition points and hysteresis effects, which are confirmed numerically, while also interpreting 'grokking' as models getting stuck in local minima.

Increasing the L2 regularization of Deep Neural Networks (DNNs) causes a first-order phase transition into the under-parametrized phase -- the so-called onset-of learning. We explain this transition via the scalar (Ricci) curvature of the error landscape. We predict new transition points as the data complexity is increased and, in accordance with the theory of phase transitions, the existence of hysteresis effects. We confirm both predictions numerically. Our results provide a natural explanation of the recently discovered phenomenon of '\emph{grokking}' as DNN models getting stuck in a local minimum of the error surface, corresponding to a lower accuracy phase. Our work paves the way for new probing methods of the intrinsic structure of DNNs in and beyond the L2 context.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes