AIMay 26

What Makes Chain-of-Thought Work at Probe Time? Local Co-occurrence Rather Than Global Derivation

arXiv:2605.2679579.5

Predicted impact top 36% in AI · last 90 daysOriginality Incremental advance

AI Analysis

For researchers and practitioners using CoT prompting, this work clarifies that the benefit is largely due to local co-occurrence effects rather than logical reasoning, suggesting that simpler methods may suffice.

The paper investigates why chain-of-thought (CoT) prompting improves language model accuracy at probe time, finding that the gains stem primarily from lexical activation and short-range token co-occurrence (2-3 tokens) rather than global logical structure. Word-shuffled rationales still outperform no-rationale baselines, and preserving short token windows recovers most of the full CoT benefit.

Chain-of-thought (CoT) prompting reliably improves language-model accuracy, but which properties of a rationale text drive the improvement is poorly understood. Prior work has largely studied generation-time behavior. We instead ask a probe-time question: given a fixed rationale in context, what in that text changes the answer? We identify two complementary sources of the gain. First, even a globally word-shuffled rationale substantially outperforms the no-rationale baseline, indicating a strong lexical activation effect. More importantly, the additional gain from structured text appears to arise less from sentence-level logical ordering and more from short-range token adjacency. Preserving contiguous windows of just $n^\star{=}2$--$3$ tokens recovers most of the remaining gain toward full CoT performance. Supporting experiments rule out copying of explicit answer declarations or answer values, as well as full grammatical realization, as primary drivers. Further generalization experiments show that the qualitative pattern remains stable across multiple model families, parameter scales, and datasets. These results support a local co-occurrence activation (LCA) account of probe-time CoT, in which the observed gains appear to arise primarily from lexical activation and short-range token co-occurrence rather than sentence-level logical derivation.

View on arXiv PDF

Similar