CLAIDec 25, 2025

Do Latent Tokens Think? A Causal and Adversarial Analysis of Chain-of-Continuous-Thought

arXiv:2512.21711v19 citationsh-index: 14
Originality Synthesis-oriented
AI Analysis

This reveals fundamental weaknesses in a popular method for enhancing reasoning in LLMs, showing it is incremental and potentially misleading for reliability-focused applications.

The paper investigates the reliability of latent tokens in Chain-of-Continuous-Thought (COCONUT) for large language models, finding that they act as uninterpretable placeholders and exploit dataset artifacts, such as on MMLU and HotpotQA, without genuine reasoning.

Latent tokens are gaining attention for enhancing reasoning in large language models (LLMs), yet their internal mechanisms remain unclear. This paper examines the problem from a reliability perspective, uncovering fundamental weaknesses: latent tokens function as uninterpretable placeholders rather than encoding faithful reasoning. While resistant to perturbation, they promote shortcut usage over genuine reasoning. We focus on Chain-of-Continuous-Thought (COCONUT), which claims better efficiency and stability than explicit Chain-of-Thought (CoT) while maintaining performance. We investigate this through two complementary approaches. First, steering experiments perturb specific token subsets, namely COCONUT and explicit CoT. Unlike CoT tokens, COCONUT tokens show minimal sensitivity to steering and lack reasoning-critical information. Second, shortcut experiments evaluate models under biased and out-of-distribution settings. Results on MMLU and HotpotQA demonstrate that COCONUT consistently exploits dataset artifacts, inflating benchmark performance without true reasoning. These findings reposition COCONUT as a pseudo-reasoning mechanism: it generates plausible traces that conceal shortcut dependence rather than faithfully representing reasoning processes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes