LG AIMay 11

Flag Varieties: A Geometric Framework for Deep Network Alignment

arXiv:2605.0986144.9

AI Analysis

Provides a principled theoretical foundation for alignment phenomena in deep learning, unifying previously ad-hoc explanations for practitioners studying network dynamics and representation geometry.

The paper derives a unified geometric framework for layerwise alignment in deep networks using flag varieties, proving that subspace intersection dimension is the unique invariant observable. It shows ridge regularization drives exponential alignment and nonlinear activations create a commutator obstruction, explaining Neural Collapse hierarchy from first principles.

Alignment, the tendency of adjacent weight matrices in deep networks to develop compatible subspace orientations, underlies gradient flow, Neural Collapse, and representation similarity across architectures. Despite extensive empirical documentation, these phenomena have resisted unified theoretical treatment: existing explanations are post-hoc, each fitted to a specific observation with whatever mathematics is at hand. We reverse this direction by deriving the mathematical structure that layerwise alignment inherently demands. Using geometric invariant theory, we prove that alignment geometry has a canonical closed, polystable stratum given by a flag variety, and that subspace intersection dimension is its unique reparameterization-invariant observable, establishing that subspace metrics are not empirical conventions but mathematical necessities. This unified framework yields two dynamical consequences: ridge regularization drives subspace alignment at an exponential rate set by weight decay, whereas nonlinear activations induce a commutator obstruction to exact basis alignment, generically present in nonlinear networks and absent in linear ones. Together these give a geometric explanation of the Level-2/3 hierarchy in Neural Collapse from first principles rather than post-hoc analysis. The commutator magnitude and head subspace overlap further serve as weight-space windows into internal alignment structure, requiring no forward passes. Experiments on multilayer perceptrons, residual networks, and pretrained language models support the proposed diagnostics and delineate their scope.

View on arXiv PDF

Similar