Chain-of-Thought and Compressed Looped Transformers: A Memory-Budget Separation
This work clarifies the memory limitations of different reasoning paradigms for researchers and practitioners developing more efficient and capable AI models, particularly highlighting the distinction between compressed and full sequence-state loops.
This paper compares chain-of-thought prompting and looped Transformers, focusing on their memory usage during test-time computation. It demonstrates that compressed looped Transformers are limited by their recurrent state size, preventing them from solving P-complete problems under logspace reductions, unlike polynomial-length chain-of-thought.
Chain-of-thought prompting and looped Transformers both give a fixed model more test-time computation, but they differ in what they remember. Chain-of-thought stores intermediate state in generated tokens that remain in the context, whereas a looped Transformer carries state through recurrent hidden activations. We argue that this persistent mutable memory is a central resource for test-time reasoning. We compare three memory regimes, the compressed latent loop, the full sequence-state loop, and the chain-of-thought scratchpad. Our main result shows that a compressed loop is limited by the size of its recurrent state. Running the loop longer adds computation but does not by itself create a growing scratchpad, so a loop with a small recurrent state remains a small-space reasoner even when run for many steps. Under a standard complexity assumption, such loops cannot decide problems that are P-complete under logspace reductions, whereas polynomial-length chain-of-thought can. The separation is specific to compressed loops, as full sequence-state loops carry state at every input position and live in a memory-rich regime closer to explicit scratchpads. Controlled pointer-chasing and associative-recall sweeps illustrate this memory-budget view, with performance sensitive to whether the persistent-state budget matches the task's working-memory demand.