Grokking as a Phase Transition between Competing Basins: a Singular Learning Theory Approach
This work provides a theoretical framework for understanding grokking in neural networks, which is an incremental advance in explaining generalization phenomena in machine learning.
The paper tackles the phenomenon of grokking, where neural networks transition abruptly from memorization to generalization after prolonged training, by interpreting it as a phase transition between competing solution basins using Singular Learning Theory. They derive closed-form expressions for the local learning coefficient in quadratic networks on modular arithmetic tasks and provide empirical verification, showing that these trajectories reliably track generalization dynamics.
Grokking, the abrupt transition from memorization to generalisation after extended training, suggests the presence of competing solution basins with distinct statistical properties. We study this phenomenon through the lens of Singular Learning Theory (SLT), a Bayesian framework that characterizes the geometry of the loss landscape via the local learning coefficient (LLC), a measure of the local degeneracy of the loss surface. SLT links lower-LLC basins to higher posterior mass concentration and lower expected generalisation error. Leveraging this theory, we interpret grokking in quadratic networks as a phase transition between competing near-zero-loss solution basins. Our contributions are two-fold: we derive closed-form expressions for the LLC in quadratic networks trained on modular arithmetic tasks, with the corresponding empirical verification; as well as empirical evidence demonstrating that LLC trajectories provide a reliable tool for tracking generalisation dynamics and interpreting phase transitions during training.