LGMay 6

How Long Does Infinite Width Last? Signal Propagation in Long-Range Linear Recurrences

arXiv:2605.051139.8
AI Analysis

For practitioners of recurrent sequence models, this work pinpoints when finite-width effects invalidate infinite-width signal propagation theory, revealing that recurrent models accumulate finite-width effects more rapidly with depth than feedforward ones.

The paper derives exact finite-width formulas for signal propagation in linear recurrent models, identifying three joint depth-width scaling regimes: subcritical (t=o(√n)), critical (t~c√n), and supercritical (t≫√n). It shows that infinite-width theory breaks down at depth t~√n, causing standard initializations like Glorot to become unstable.

We study signal propagation in linear recurrent models at finite width. While existing signal propagation theory relies predominantly on the infinite-width limit, it remains unclear for how long that approximation remains accurate when recurrent depth $t$ grows jointly with width $n$. This question is especially relevant for modern recurrent sequence models, whose natural operating regime involves long input sequences, i.e., large $t$. We derive exact finite-width formulas for the hidden state signal energies in linear recurrences under complex Gaussian initialization. Using these formulas, we identify the joint depth-width scaling regimes that govern signal propagation: (i) a subcritical regime $t=o(\sqrt n)$, in which the infinite-width approximation remains valid; (ii) a critical regime $t\sim c\sqrt n$, in which non-negligible deviations from infinite-width predictions appear and a nontrivial joint scaling limit emerges; and (iii) a supercritical regime $t\gg \sqrt n$, in which finite-width effects dominate. Thus, our results pinpoint the precise recurrent depth scale at which infinite-width theory breaks down in long-range linear recurrences. In turn, this shows when standard initialization schemes, such as Glorot, become unstable. More broadly, our results demonstrate that finite-width effects accumulate more rapidly with depth in recurrent models than in feedforward ones, leading to qualitatively different signal propagation behavior.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes