LGDGMay 4

Geometric and Spectral Alignment for Deep Neural Network I

arXiv:2605.0210833.5
AI Analysis

For deep learning theorists, this work provides a rigorous geometric framework for understanding spectral properties of deep residual networks, though it is incremental in building on existing near-identity and product-of-Jacobian models.

This paper models deep residual networks as products of near-identity Jacobians and proves geometric and spectral alignment results, including a rigidity theorem that bounds exponent drift by (log M)/L in the zero-slack case, with slack and residual increments augmenting the bound. It provides deterministic estimates for singular spectra of layer factors, establishing connections to Fisher information, Bures-Wasserstein geometry, and effective rank.

Deep residual architectures are modeled as products of near-identity Jacobians. This paper proves deterministic quotient-geometric estimates for singular spectra of Frobenius-normalized layer factors, emphasizing a normalized top-radial Cartan coordinate and fitted power-law chart. Full-rank factors are mapped from $\mathrm{GL}(d)$ to the positive cone by $A\mapsto A^\top A$, then to ordered eigenvalue data. Under Frobenius normalization, exact power-law spectra form a trace-normalized Cartan orbit. This orbit is a Gibbs family on ranks, a Fisher information line, and a Bures--Wasserstein curve with line element $d/4$ times Fisher information. The main rigidity theorem is a slack-aware margin inequality: interface radial amplitude, non-backtracking slack, and signed residual variation control displacement of the fitted Cartan coordinate. In the exact-chart zero-slack case, a depth-$L$ budget gives exponent drift of order $(\log M)/L$; generally, slack and residual increments augment the bound. We separate scalar top-radial from full-Cartan spectral control, which also needs Bures/Hellinger residual variation. We prove approximate-power-law and metric-chart versions, converse lower bounds, Fisher--KL/Bures action estimates, and near-identity expansions for normalized residual chains. Near-identity results verify transport budgets; chart quality remains measurable. Effective rank is a spectral-energy quantile, giving finite-width power-law tail bounds and robust rank-window transition estimates. Empirical static-weight exponent profiles serve as diagnostics; full verification also requires interface budgets, slacks, and residuals for the same operator chain.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes