FLLGLONov 25, 2025

Softmax Transformers are Turing-Complete

arXiv:2511.20038v17 citations
Originality Highly original
AI Analysis

This resolves a foundational theoretical question in AI about the computational limits of widely used transformer architectures, with implications for understanding their reasoning capabilities.

The paper tackles the open problem of whether softmax attention Chain-of-Thought transformers are Turing-complete, proving that length-generalizable versions are Turing-complete for letter-bounded languages and with extensions for arbitrary languages, and empirically validates this with transformers trained on complex arithmetic reasoning tasks.

Hard attention Chain-of-Thought (CoT) transformers are known to be Turing-complete. However, it is an open problem whether softmax attention Chain-of-Thought (CoT) transformers are Turing-complete. In this paper, we prove a stronger result that length-generalizable softmax CoT transformers are Turing-complete. More precisely, our Turing-completeness proof goes via the CoT extension of the Counting RASP (C-RASP), which correspond to softmax CoT transformers that admit length generalization. We prove Turing-completeness for CoT C-RASP with causal masking over a unary alphabet (more generally, for letter-bounded languages). While we show this is not Turing-complete for arbitrary languages, we prove that its extension with relative positional encoding is Turing-complete for arbitrary languages. We empirically validate our theory by training transformers for languages requiring complex (non-linear) arithmetic reasoning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes