LG MLJun 4

Dead Directions: Geometric Singular Learning

arXiv:2606.0595735.2

Predicted impact top 62% in LG · last 90 daysOriginality Highly original

AI Analysis

For researchers in singular learning theory and deep learning theory, this provides a practical, coordinate-based method to compute Bayesian invariants without resolution of singularities or posterior sampling.

This paper bridges singular learning theory and information geometry by introducing the 'dead direction' concept, showing that the KL divergence order along singular directions can be recovered from the Fisher metric's directional curvature in original coordinates. The authors extend this to multi-component crossings, deep networks, and propose DDCAdam, a preconditioner that respects the singular geometry, enabling closed-form predictions of Watanabe's invariants from a single checkpoint.

Singular learning theory and information geometry have studied the same parameter spaces in mostly separate vocabularies: the former computes Bayesian invariants in resolved coordinates, the latter works in original coordinates under a non-degeneracy assumption that overparameterised models routinely violate. We bridge them through one primitive, the dead direction: a unit vector along which the Fisher metric degenerates, equivalently a tangent to the analytic singular set with a definite KL order, set by how fast the KL divergence vanishes. The two readings name the same vector; our central move shows its KL order is recoverable as the decay rate of the directional Fisher curvature approaching the singularity, in original parameter coordinates and without a Hironaka resolution. A selection rule on smooth fibres translates this rate into Watanabe's single-direction contribution to the real log canonical threshold, and we extend the recovery to multi-component crossings, multiplicity $m$, the singular fluctuation $ν$ (universal in the KL order for 1D directions), prior-RLCT shifts, and tempered posteriors. We then lift this rate to a deep network: a multi-layer K-FAC factorisation writes each Fisher block as a product of activation- and gradient-side rates with a duality between them, instantiated at modern-network primitives (residual streams, layer normalisation, attention). A quotient theorem carries the rate to the gauge quotient $Θ/G$ under gradient flow on a $G$-invariant metric; SGD qualifies, standard Adam does not, and we construct a $G$-equivariant Adam-family preconditioner (DDCAdam) that does. The bridge yields a parameter-coordinate handle on singular geometry, closed-form per-architecture predictions, and a trajectory-rate readout of Watanabe's triple $(λ, m, ν)$ from one checkpoint's forward and backward passes, without posterior sampling.

View on arXiv PDF

Similar