HEP-THDIS-NNLGApr 17, 2025

A Two-Phase Perspective on Deep Learning Dynamics

arXiv:2504.12700v14 citationsh-index: 2Phys rev E
Originality Incremental advance
AI Analysis

This provides a unified theoretical framework for understanding generalization dynamics in deep learning, though it is incremental in building on existing concepts.

The authors propose that deep learning involves two phases—rapid curve fitting followed by slower compression—supported by aligning grokking, double descent, and information bottleneck phenomena, with mutual information emerging as a key progress measure.

We propose that learning in deep neural networks proceeds in two phases: a rapid curve fitting phase followed by a slower compression or coarse graining phase. This view is supported by the shared temporal structure of three phenomena: grokking, double descent and the information bottleneck, all of which exhibit a delayed onset of generalization well after training error reaches zero. We empirically show that the associated timescales align in two rather different settings. Mutual information between hidden layers and input data emerges as a natural progress measure, complementing circuit-based metrics such as local complexity and the linear mapping number. We argue that the second phase is not actively optimized by standard training algorithms and may be unnecessarily prolonged. Drawing on an analogy with the renormalization group, we suggest that this compression phase reflects a principled form of forgetting, critical for generalization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes