A Pre-Training Analogue of Grokking in Language Models: Tracing Delayed Grammatical Generalization
For researchers studying neural network generalization and language model training dynamics, this work provides a method to analyze delayed generalization in LLM pre-training, though it is an incremental extension of grokking concepts to a new setting.
The paper introduces an exposure-based framework to study grokking-like delayed generalization in LLM pre-training, demonstrating across five grammatical phenomena that generalization occurs after initial fitting, with grammatical concept vectors becoming more predictive and higher-dimensional post-generalization.
Grokking, the phenomenon in which neural networks generalize long after fitting their training data, has been studied in supervised settings on many epochs. LLM pre-training instead involves next-token prediction over an unlabeled corpus, with limited data repetition and no explicit train/validation split. To address this, we propose an exposure-based framework that enables the study of grokking-like dynamics during LLM pre-training. We ground our evaluation in BLiMP minimal pairs, which provide controlled grammatical contrasts. For every BLiMP minimal pair, we identify a critical phrase, the smallest continuous span that captures the grammatical contrast and the phenomenon-relevant context. Examples whose critical phrase appears in the pre-training window are assigned to the proxy-train split; the remaining examples are assigned to the proxy-validation split. Across five grammatical phenomena, we observe delayed generalization. Analyzing pre-training checkpoints before and after generalization shows that grammatical concept vectors become more predictive of grammatical acceptability and occupy a higher-dimensional subspace after generalization. We also find that attention from the critical token to the relevant context token is concentrated in a small number of heads.