Grokking phase transitions in learning local rules with gradient descent
This provides theoretical insights into generalization phenomena in machine learning, particularly for researchers studying phase transitions and rule learning, though it is incremental as it builds on existing grokking concepts.
The paper tackled the problem of understanding grokking (generalization after overfitting) as a phase transition in learning local rules, deriving exact analytic expressions for critical exponents, grokking probability, and grokking time distribution, and showing that grokking arises from the locality of the teacher model, with numerical validation in a cellular automata task.
We discuss two solvable grokking (generalisation beyond overfitting) models in a rule learning scenario. We show that grokking is a phase transition and find exact analytic expressions for the critical exponents, grokking probability, and grokking time distribution. Further, we introduce a tensor-network map that connects the proposed grokking setup with the standard (perceptron) statistical learning theory and show that grokking is a consequence of the locality of the teacher model. As an example, we analyse the cellular automata learning task, numerically determine the critical exponent and the grokking time distributions and compare them with the prediction of the proposed grokking model. Finally, we numerically analyse the connection between structure formation and grokking.