LG MLOct 26, 2023

Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity

arXiv:2310.17247v215.518 citationsh-index: 5Has Code

Originality Incremental advance

AI Analysis

This finding broadens understanding of grokking for machine learning researchers, showing it is not neural-specific but may apply to any model guided by complexity and error, though it is incremental in scope.

The paper discovered that grokking, a phenomenon where models achieve high validation accuracy long after training accuracy, occurs beyond neural networks in settings like Gaussian processes and linear regression, and identified a method to induce it by adding spurious dimensions.

In some settings neural networks exhibit a phenomenon known as \textit{grokking}, where they achieve perfect or near-perfect accuracy on the validation set long after the same performance has been achieved on the training set. In this paper, we discover that grokking is not limited to neural networks but occurs in other settings such as Gaussian process (GP) classification, GP regression, linear regression and Bayesian neural networks. We also uncover a mechanism by which to induce grokking on algorithmic datasets via the addition of dimensions containing spurious information. The presence of the phenomenon in non-neural architectures shows that grokking is not restricted to settings considered in current theoretical and empirical studies. Instead, grokking may be possible in any model where solution search is guided by complexity and error.

View on arXiv PDF Code

Similar