On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning
This provides a theoretical foundation for CoT reasoning in language models, addressing a gap in understanding their computational capabilities, though it is incremental in building on existing Turing completeness results.
The paper tackles the problem of understanding why chain-of-thought reasoning improves language model performance by formalizing it in a probabilistic setting, showing that recurrent and transformer LMs with CoT can represent the same family of distributions over strings as probabilistic Turing machines.
The performance of modern language models (LMs) has been improved by chain-of-thought (CoT) reasoning, i.e., the process of generating intermediate results that guide the model towards a final answer. A possible explanation for this improvement is that CoT reasoning extends an LM's computational power, as RNNs and transformers with additional scratch space are known to be Turing complete. Comparing LMs to Turing machines, however, introduces a category error - Turing machines decide language membership, whereas LMs define distributions over strings. To bridge this gap, we formalize CoT reasoning in a probabilistic setting. We present several results on the representational capacity of recurrent and transformer LMs with CoT reasoning, showing that they can represent the same family of distributions over strings as probabilistic Turing machines.