LGDec 13, 2024

The Complexity Dynamics of Grokking

arXiv:2412.09810v222 citationsh-index: 8Physica A: Statistical Mechanics and its Applications
Originality Incremental advance
AI Analysis

This work addresses the fundamental issue of how neural networks generalize, providing insights for machine learning researchers, but it is incremental as it builds on existing theories of compression and complexity.

The paper tackles the problem of understanding the grokking phenomenon in neural networks, where networks transition from memorization to generalization, by introducing a theoretical framework based on rate-distortion theory and Kolmogorov complexity to measure complexity, achieving compression ratios 30-40x better than naïve approaches.

We demonstrate the existence of a complexity phase transition in neural networks by studying the grokking phenomenon, where networks suddenly transition from memorization to generalization long after overfitting their training data. To characterize this phase transition, we introduce a theoretical framework for measuring complexity based on rate-distortion theory and Kolmogorov complexity, which can be understood as principled lossy compression for networks. We find that properly regularized networks exhibit a sharp phase transition: complexity rises during memorization, then falls as the network discovers a simpler underlying pattern that generalizes. In contrast, unregularized networks remain trapped in a high-complexity memorization phase. We establish an explicit connection between our complexity measure and generalization bounds, providing a theoretical foundation for the link between lossy compression and generalization. Our framework achieves compression ratios 30-40x better than naïve approaches, enabling precise tracking of complexity dynamics. Finally, we introduce a regularization method based on spectral entropy that encourages networks toward low-complexity representations by penalizing their intrinsic dimension.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes