As Language Models Scale, Low-order Linear Depth Dynamics Emerge

arXiv:2603.1254145.6

AI Analysis

This provides a systems-theoretic foundation for analyzing and controlling large language models, which are often treated as black boxes, though the findings are incremental as they build on existing scaling principles.

The paper shows that transformer depth dynamics can be accurately approximated by low-order linear surrogates within context, with a 32-dimensional surrogate reproducing GPT-2-large's layerwise sensitivity with near-perfect agreement across tasks like toxicity and sentiment. It also finds that agreement improves with model size and enables energy-efficient multi-layer interventions.

Large language models are often viewed as high-dimensional nonlinear systems and treated as black boxes. Here, we show that transformer depth dynamics admit accurate low-order linear surrogates within context. Across tasks including toxicity, irony, hate speech and sentiment, a 32-dimensional linear surrogate reproduces the layerwise sensitivity profile of GPT-2-large with near-perfect agreement, capturing how the final output shifts under additive injections at each layer. We then uncover a surprising scaling principle: for a fixed-order linear surrogate, agreement with the full model improves monotonically with model size across the GPT-2 family. This linear surrogate also enables principled multi-layer interventions that require less energy than standard heuristic schedules when applied to the full model. Together, our results reveal that as language models scale, low-order linear depth dynamics emerge within contexts, offering a systems-theoretic foundation for analyzing and controlling them.

View on arXiv PDF

Similar