LGAIApr 28

Feature Repulsion and Spectral Lock-in: An Empirical Study of Two-Layer Network Grokking

arXiv:2605.0811926.7
AI Analysis

For researchers studying grokking and feature learning in neural networks, this work provides empirical validation of a theoretical mechanism and reveals activation-dependent spectral signatures, though the findings are incremental as they test an existing theorem on a specific setup.

This paper empirically tests Tian's repulsion theorem for grokking in two-layer networks, finding that the predicted sign rule holds robustly across activations, but the spectral signature in parameter updates is activation-dependent: with x² activation, a spectral detector fires in all grokking seeds with 229× magnitude separation, while with ReLU it never fires, aligning with theoretical distinctions between focused and spreading memorization.

Tian (2025) proves a repulsion theorem (Theorem 6) for the matrix $ B = (\widetilde{F}^\top \widetilde{F} + ηI)^{-1} $ during the interactive feature-learning stage of grokking: similar features have negative off-diagonal entries $ B_{j\ell} $, producing an effective repulsive force that drives them apart. However, the theorem does not specify when this mechanism becomes empirically observable, nor whether it leaves a measurable spectral signature in the parameter updates. We test this directly on Tian's modular addition setup ($ M = 71 $, $ K = 2048 $, MSE loss) and observe a clear structure-mechanism dissociation. The predicted sign rule holds robustly on the top-200 most-similar feature pairs across activations (empirical sign-match rising from 0.865 to 0.985 on $ σ= x^2 $ across 5 seeds, and saturating at 1.000 on $ σ= \operatorname{ReLU} $). However, the spectral signature in the parameter updates is strongly activation-dependent. With $ σ= x^2 $, a simple slope detector on the rolling eigengap $ σ_2 / σ_3 $ of $ ΔW $ fires in 15/15 grokking seeds at epoch 174 (IQR [173,174]) and in 0/15 non-grokking controls, with 229$ \times $ late-stage magnitude separation; the spectrum is rank-2. In contrast, with $ σ= \operatorname{ReLU} $, the detector never fires and the spectrum remains effectively rank-1. This dissociation aligns with Tian's Theorem 5 distinction between focused (power-law) and spreading (ReLU) memorization: while the sign structure of $ B $ depends only on $ \widetilde{F}^\top \widetilde{F} $, how feature repulsion translates into weight updates critically depends on the activation derivative $ σ' $.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes