MLDIS-NNLGOct 5, 2023

Grokking as a First Order Phase Transition in Two Layer Networks

arXiv:2310.03789v346 citationsh-index: 20
Originality Incremental advance
AI Analysis

This work provides a theoretical explanation for a key feature-learning aspect in deep learning, which is incremental as it builds on existing models and phase transition theory.

The paper investigates the Grokking phenomenon in deep neural networks, where test accuracy suddenly increases, by applying the adaptive kernel approach to teacher-student models and mapping it to first-order phase transitions, showing that after Grokking, the network develops distinct internal representations.

A key property of deep neural networks (DNNs) is their ability to learn new features during training. This intriguing aspect of deep learning stands out most clearly in recently reported Grokking phenomena. While mainly reflected as a sudden increase in test accuracy, Grokking is also believed to be a beyond lazy-learning/Gaussian Process (GP) phenomenon involving feature learning. Here we apply a recent development in the theory of feature learning, the adaptive kernel approach, to two teacher-student models with cubic-polynomial and modular addition teachers. We provide analytical predictions on feature learning and Grokking properties of these models and demonstrate a mapping between Grokking and the theory of phase transitions. We show that after Grokking, the state of the DNN is analogous to the mixed phase following a first-order phase transition. In this mixed phase, the DNN generates useful internal representations of the teacher that are sharply distinct from those before the transition.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes