LG DGMay 4

Geometric and Spectral Alignment for Deep Neural Network I

Ziran Liu, Wei Wang, Jinhao Wang, Pengcheng Wang, Xinyi Sui, Cihan Ruan, Nam Ling, Wei Jiang

arXiv:2605.0210833.5

AI Analysis

For deep learning theorists, this work provides a rigorous geometric framework for understanding spectral properties of deep residual networks, though it is incremental in building on existing near-identity and product-of-Jacobian models.

This paper models deep residual networks as products of near-identity Jacobians and proves geometric and spectral alignment results, including a rigidity theorem that bounds exponent drift by (log M)/L in the zero-slack case, with slack and residual increments augmenting the bound. It provides deterministic estimates for singular spectra of layer factors, establishing connections to Fisher information, Bures-Wasserstein geometry, and effective rank.

Deep residual architectures are modeled as products of near-identity Jacobians. This paper proves deterministic quotient-geometric estimates for singular spectra of Frobenius-normalized layer factors, emphasizing a normalized top-radial Cartan coordinate and fitted power-law chart. Full-rank factors are mapped from $\mathrm{GL}(d)$ to the positive cone by $A\mapsto A^\top A$, then to ordered eigenvalue data. Under Frobenius normalization, exact power-law spectra form a trace-normalized Cartan orbit. This orbit is a Gibbs family on ranks, a Fisher information line, and a Bures--Wasserstein curve with line element $d/4$ times Fisher information. The main rigidity theorem is a slack-aware margin inequality: interface radial amplitude, non-backtracking slack, and signed residual variation control displacement of the fitted Cartan coordinate. In the exact-chart zero-slack case, a depth-$L$ budget gives exponent drift of order $(\log M)/L$; generally, slack and residual increments augment the bound. We separate scalar top-radial from full-Cartan spectral control, which also needs Bures/Hellinger residual variation. We prove approximate-power-law and metric-chart versions, converse lower bounds, Fisher--KL/Bures action estimates, and near-identity expansions for normalized residual chains. Near-identity results verify transport budgets; chart quality remains measurable. Effective rank is a spectral-energy quantile, giving finite-width power-law tail bounds and robust rank-window transition estimates. Empirical static-weight exponent profiles serve as diagnostics; full verification also requires interface budgets, slacks, and residuals for the same operator chain.

View on arXiv PDF

Similar