GrokAlign: Geometric Characterisation and Acceleration of Grokking
This work addresses the challenge of improving training efficiency and generalization in deep learning, particularly for researchers and practitioners dealing with delayed learning phenomena, though it appears incremental as it builds on prior theoretical insights.
The paper tackles the problem of understanding and accelerating grokking, a phenomenon where deep networks exhibit delayed generalization and emergent robustness, by showing that aligning a network's Jacobians with training data ensures grokking under a low-rank assumption. It introduces GrokAlign, a Jacobian regularization method that empirically induces grokking much sooner than conventional regularizers like weight decay.
A key challenge for the machine learning community is to understand and accelerate the training dynamics of deep networks that lead to delayed generalisation and emergent robustness to input perturbations, also known as grokking. Prior work has associated phenomena like delayed generalisation with the transition of a deep network from a linear to a feature learning regime, and emergent robustness with changes to the network's functional geometry, in particular the arrangement of the so-called linear regions in deep networks employing continuous piecewise affine nonlinearities. Here, we explain how grokking is realised in the Jacobian of a deep network and demonstrate that aligning a network's Jacobians with the training data (in the sense of cosine similarity) ensures grokking under a low-rank Jacobian assumption. Our results provide a strong theoretical motivation for the use of Jacobian regularisation in optimizing deep networks -- a method we introduce as GrokAlign -- which we show empirically to induce grokking much sooner than more conventional regularizers like weight decay. Moreover, we introduce centroid alignment as a tractable and interpretable simplification of Jacobian alignment that effectively identifies and tracks the stages of deep network training dynamics. Accompanying webpage (https://thomaswalker1.github.io/blog/grokalign.html) and code (https://github.com/ThomasWalker1/grokalign).