LGAIMar 10, 2023

Unifying Grokking and Double Descent

arXiv:2303.06173v159 citationsh-index: 77
Originality Incremental advance
AI Analysis

This work aims to unify disparate observations in deep learning generalization, potentially benefiting researchers in machine learning theory, but it appears incremental as it builds on prior studies of grokking and double descent.

The authors tackled the problem of understanding generalization in deep learning by hypothesizing that grokking and double descent are instances of the same learning dynamics, and they demonstrated model-wise grokking for the first time.

A principled understanding of generalization in deep learning may require unifying disparate observations under a single conceptual framework. Previous work has studied \emph{grokking}, a training dynamic in which a sustained period of near-perfect training performance and near-chance test performance is eventually followed by generalization, as well as the superficially similar \emph{double descent}. These topics have so far been studied in isolation. We hypothesize that grokking and double descent can be understood as instances of the same learning dynamics within a framework of pattern learning speeds. We propose that this framework also applies when varying model capacity instead of optimization steps, and provide the first demonstration of model-wise grokking.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes