Position: The Turing-Completeness of Real-World Autoregressive Transformers Relies Heavily on Context Management

arXiv:2605.1951464.4

Predicted impact top 50% in AI · last 90 daysOriginality Synthesis-oriented

AI Analysis

For researchers and practitioners working on LLMs, this paper clarifies a common misinterpretation about Transformer Turing-completeness and highlights the central role of context management in determining computational power.

This paper clarifies that existing proofs of Transformer Turing-completeness often assume a scaling-family setting (different models for different input lengths), which does not match real-world LLM deployment (fixed system). The authors formalize the fixed-system setting and argue that context management critically determines computational power, showing that different context-management methods yield sharply different capabilities.

Many works make the eye-catching claim that Transformers are Turing-complete. However, the literature often conflates two distinct settings: (i) a fixed Transformer system setting, in which a fixed autoregressive Transformer is coupled with a fixed context-management method to process inputs of different lengths step by step, and (ii) a scaling-family setting, in which a family of different models (with increasing context-window length or numerical precision) is used to handle different input lengths. Existing proofs of Transformer Turing-completeness are frequently established in setting (ii), whereas real-world LLM deployment and the standard notion of Turing-completeness correspond more naturally to setting (i). In this paper, we first formalize the fixed-system setting, thereby providing a concrete characterization of how real-world LLMs operate. We then argue that results proved in the scaling-family setting provide theoretically meaningful resource bounds but do not establish Turing-completeness, thereby clarifying a common misinterpretation of existing results. Finally, we show that different context-management methods can yield sharply different computational power, and we advocate the position that context management is a central component that critically determines the computational power of real-world autoregressive Transformers.

View on arXiv PDF

Similar