An Asymptotic Theory of Chain-of-Thought in In-Context Learning

arXiv:2606.0321729.1

Predicted impact top 1% in ML · last 90 daysOriginality Highly original

AI Analysis

Provides a unified theoretical understanding of how test-time reasoning depth affects generalization in LLMs, addressing a poorly understood scaling behavior.

This paper develops an exact asymptotic theory for chain-of-thought reasoning in in-context learning, deriving generalization error as a function of reasoning depth, pretraining data, and context length. It reveals phase transitions between exponential and polynomial improvement, saturation, and overthinking, showing that deeper reasoning helps only with sufficient pretraining and context.

Chain-of-thought (CoT) reasoning has become a widely used mechanism for eliciting multi-step reasoning in large language models by generating intermediate reasoning steps at inference time. Yet the scaling behavior of generalization with CoT depth remains poorly understood. To address this question, we study a theoretically solvable model of CoT for in-context weight prediction in linear regression, where test-time reasoning is represented as an iterative refinement of the weight-parameter estimate. Using tools from random matrix theory under high-dimensional asymptotics, we derive an exact formula for the generalization error as a function of reasoning depth, pretraining data amount, and context length. Our analysis reveals a sharp phase transition separating exponential and polynomial improvement, saturation, and overthinking, and characterizes how the optimal reasoning depth scales. We further show that deeper reasoning is most effective with sufficiently rich pretraining and in-context information, whereas limited pretraining or context makes longer reasoning prone to error amplification or saturation. We also validate these predictions through experiments on fully learned linear attention and softmax attention models. Our results provide a unified theoretical account of how test-time CoT depth affects generalization.

View on arXiv PDF

Similar