LGAIOct 28, 2025

The Kinetics of Reasoning: How Chain-of-Thought Shapes Learning in Transformers?

Amazon
arXiv:2510.25791v1h-index: 38
Originality Incremental advance
AI Analysis

This work provides insights into transformer learning dynamics for researchers, but it is incremental as it builds on existing CoT methods without solving fundamental limitations.

The study investigated how chain-of-thought (CoT) supervision affects transformer learning on symbolic reasoning tasks, finding that CoT accelerates generalization but fails to overcome high algorithmic complexity, and revealing a transient phase where models produce correct answers with unfaithful reasoning traces before alignment.

Chain-of-thought (CoT) supervision can substantially improve transformer performance, yet the mechanisms by which models learn to follow and benefit from CoT remain poorly understood. We investigate these learning dynamics through the lens of grokking by pretraining transformers on symbolic reasoning tasks with tunable algorithmic complexity and controllable data composition to study their generalization. Models were trained under two settings: (i) producing only final answers, and (ii) emitting explicit CoT traces before answering. Our results show that while CoT generally improves task performance, its benefits depend on task complexity. To quantify these effects, we model the accuracy of the logarithmic training steps with a three-parameter logistic curve, revealing how the learning speed and shape vary with task complexity, data distribution, and the presence of CoT supervision. We also uncover a transient trace unfaithfulness phase: early in training, models often produce correct answers while skipping or contradicting CoT steps, before later aligning their reasoning traces with answers. Empirically, we (1) demonstrate that CoT accelerates generalization but does not overcome tasks with higher algorithmic complexity, such as finding list intersections; (2) introduce a kinetic modeling framework for understanding transformer learning; (3) characterize trace faithfulness as a dynamic property that emerges over training; and (4) show CoT alters internal transformer computation mechanistically.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes