AICLOct 5, 2025

Internal states before wait modulate reasoning patterns

arXiv:2510.04128v13 citationsh-index: 5EMNLP
Originality Incremental advance
AI Analysis

This work provides insights into the internal mechanisms of reasoning in AI models, which could help improve their performance, but it is incremental as it builds on prior understanding of 'wait' tokens.

The study investigated whether latent states before 'wait' tokens in reasoning models contain information that modulates subsequent reasoning patterns, and identified a small set of features that influence behaviors like restarting, recalling, expressing uncertainty, and double-checking.

Prior work has shown that a significant driver of performance in reasoning models is their ability to reason and self-correct. A distinctive marker in these reasoning traces is the token wait, which often signals reasoning behavior such as backtracking. Despite being such a complex behavior, little is understood of exactly why models do or do not decide to reason in this particular manner, which limits our understanding of what makes a reasoning model so effective. In this work, we address the question whether model's latents preceding wait tokens contain relevant information for modulating the subsequent reasoning process. We train crosscoders at multiple layers of DeepSeek-R1-Distill-Llama-8B and its base version, and introduce a latent attribution technique in the crosscoder setting. We locate a small set of features relevant for promoting/suppressing wait tokens' probabilities. Finally, through a targeted series of experiments analyzing max activating examples and causal interventions, we show that many of our identified features indeed are relevant for the reasoning process and give rise to different types of reasoning patterns such as restarting from the beginning, recalling prior knowledge, expressing uncertainty, and double-checking.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes