LGSep 22, 2025

Flatness is Necessary, Neural Collapse is Not: Rethinking Generalization via Grokking

arXiv:2509.17738v25 citationsh-index: 21
Originality Highly original
AI Analysis

This work addresses a foundational problem in machine learning by clarifying the causal roles of geometric properties in generalization, with implications for understanding and improving model training.

The study investigated whether neural collapse or loss landscape flatness is necessary for generalization in deep networks, using grokking to separate memorization from generalization, and found that only flatness consistently predicts generalization, with models regularized away from flat solutions showing delayed generalization.

Neural collapse, i.e., the emergence of highly symmetric, class-wise clustered representations, is frequently observed in deep networks and is often assumed to reflect or enable generalization. In parallel, flatness of the loss landscape has been theoretically and empirically linked to generalization. Yet, the causal role of either phenomenon remains unclear: Are they prerequisites for generalization, or merely by-products of training dynamics? We disentangle these questions using grokking, a training regime in which memorization precedes generalization, allowing us to temporally separate generalization from training dynamics and we find that while both neural collapse and relative flatness emerge near the onset of generalization, only flatness consistently predicts it. Models encouraged to collapse or prevented from collapsing generalize equally well, whereas models regularized away from flat solutions exhibit delayed generalization, resembling grokking, even in architectures and datasets where it does not typically occur. Furthermore, we show theoretically that neural collapse leads to relative flatness under classical assumptions, explaining their empirical co-occurrence. Our results support the view that relative flatness is a potentially necessary and more fundamental property for generalization, and demonstrate how grokking can serve as a powerful probe for isolating its geometric underpinnings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes