Why Deep Jacobian Spectra Separate: Depth-Induced Scaling and Singular-Vector Alignment
This work addresses the problem of explaining implicit bias in deep learning for researchers, offering a mechanistic account that is incremental by building on prior analyses without requiring balancing.
The paper tackled the challenge of understanding implicit bias in deep network training by proposing two signatures of deep Jacobians: depth-induced exponential scaling of singular values and strong spectral separation, which lead to singular-vector alignment and decoupled dynamics, validated through experiments in fixed-gates settings.
Understanding why gradient-based training in deep networks exhibits strong implicit bias remains challenging, in part because tractable singular-value dynamics are typically available only for balanced deep linear models. We propose an alternative route based on two theoretically grounded and empirically testable signatures of deep Jacobians: depth-induced exponential scaling of ordered singular values and strong spectral separation. Adopting a fixed-gates view of piecewise-linear networks, where Jacobians reduce to products of masked linear maps within a single activation region, we prove the existence of Lyapunov exponents governing the top singular values at initialization, give closed-form expressions in a tractable masked model, and quantify finite-depth corrections. We further show that sufficiently strong separation forces singular-vector alignment in matrix products, yielding an approximately shared singular basis for intermediate Jacobians. Together, these results motivate an approximation regime in which singular-value dynamics become effectively decoupled, mirroring classical balanced deep-linear analyses without requiring balancing. Experiments in fixed-gates settings validate the predicted scaling, alignment, and resulting dynamics, supporting a mechanistic account of emergent low-rank Jacobian structure as a driver of implicit bias.