LGAICLFeb 17

Anatomy of Capability Emergence: Scale-Invariant Representation Collapse and Top-Down Reorganization in Neural Networks

arXiv:2602.15997v11 citations
Originality Incremental advance
AI Analysis

This work provides mechanistic insights into capability emergence in neural networks, which is incremental for understanding training dynamics in AI.

The researchers investigated how capabilities emerge during neural network training by tracking geometric measures across multiple model scales and tasks, finding that training begins with a universal representation collapse to scale-invariant task-specific floors and propagates top-down through layers, contradicting bottom-up intuition. They also identified that representation geometry leads emergence with 75-100% precursor rates for hard tasks, while prediction limits exist for fine-grained timing.

Capability emergence during neural network training remains mechanistically opaque. We track five geometric measures across five model scales (405K-85M parameters), 120+ emergence events in eight algorithmic tasks, and three Pythia language models (160M-2.8B). We find: (1) training begins with a universal representation collapse to task-specific floors that are scale-invariant across a 210X parameter range (e.g., modular arithmetic collapses to RANKME ~ 2.0 regardless of model size); (2) collapse propagates top-down through layers (32/32 task X model consistency), contradicting bottom-up feature-building intuition; (3) a geometric hierarchy in which representation geometry leads emergence (75-100% precursor rate for hard tasks), while the local learning coefficient is synchronous (0/24 precursor) and Hessian measures lag. We also delineate prediction limits: geometric measures encode coarse task difficulty but not fine-grained timing (within-class concordance 27%; when task ordering reverses across scales, prediction fails at 26%). On Pythia, global geometric patterns replicate but per-task precursor signals do not -- the precursor relationship requires task-training alignment that naturalistic pre-training does not provide. Our contribution is the geometric anatomy of emergence and its boundary conditions, not a prediction tool.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes