Propagation of Chaos in Contextual Flow Maps

Shi Chen, Zhengjiang Lin, Kaizhao Liu, Philippe Rigollet

arXiv:2605.1674777.41 citations

AI Analysis

Provides rigorous statistical guarantees for transformer performance as context length grows, addressing a key theoretical gap for practitioners scaling models.

This paper develops a quantitative theory for transformers in the large-context regime using contextual flow maps, proving that finite-context models approximate infinite-context systems with optimal Wasserstein rates (n^{-1/d} general, n^{-1/2} for transformers).

We develop a quantitative statistical theory of transformers in the large-context regime by adopting the abstraction of contextual flow maps (CFMs): dynamical systems that evolve a distinguished token in the presence of a contextual measure across a stack of attention blocks. Within this framework, the finite-context model approximates an idealized infinite-context system in which the contextual measure is replaced by its underlying population, so that the context length $n$ becomes a statistical resource. Exploiting the McKean--Vlasov structure of the dynamics and the classical machinery of propagation of chaos, we establish a forward bound controlling the deviation between the finite- and infinite-context CFMs uniformly along depth, and a backward bound controlling the deviation between the corresponding training trajectories uniformly across iterations of online gradient descent. Both bounds achieve the optimal Wasserstein rate $n^{-1/d}$ for general CFMs and parametric rate $n^{-1/2}$ for a restricted class of CFMs that includes transformers as a special case. The analysis rests on a new Eulerian adjoint formulation of the loss gradient and stability estimates for the resulting forward--adjoint system, both of which may be of independent interest.

View on arXiv PDF

Similar