CLAIAug 5, 2025

LLMs are Single-threaded Reasoners: Demystifying the Working Mechanism of Soft Thinking

arXiv:2508.03440v423 citationsh-index: 1
Originality Incremental advance
AI Analysis

This addresses a limitation in LLM reasoning for AI researchers, showing incremental improvement by adding stochasticity to an existing approach.

The paper investigates Soft Thinking in LLMs and finds they behave as single-threaded reasoners, relying on the highest-probability token in a greedy feedback loop that suppresses alternative reasoning paths; it proposes Stochastic Soft Thinking with the Gumbel-Softmax trick, achieving superior performance across eight reasoning benchmarks.

Human cognition naturally engages with abstract and fluid concepts, whereas existing reasoning models often rely on generating discrete tokens, potentially constraining their expressive capabilities. Recent advancements aim to address this limitation by enabling large language models (LLMs) to generate soft, abstract tokens, thus facilitating reasoning within a continuous concept space. In this paper, we investigate the Soft Thinking capabilities of various LLMs through a systematic analysis of their internal behavior using a suite of probing techniques. Contrary to the prevailing belief that Soft Thinking supports parallel exploration of diverse reasoning paths, our findings reveal that LLMs behave as single-threaded reasoners--they predominantly rely on the token with the highest probability in the soft input to predict the next step. This behavior induces a greedy feedback loop that suppresses alternative reasoning paths and undermines the benefits of transmitting richer information via Soft Tokens. To address this Greedy Pitfall, we propose Stochastic Soft Thinking, which introduces stochasticity to break free from this Greedy Pitfall. Our experiments demonstrate that incorporating randomness--particularly with the Gumbel-Softmax trick--can alleviate the limitations of vanilla approaches and unleash the potential of Soft Thinking, resulting in superior performance across eight reasoning benchmarks. We further demonstrate that Stochastic Soft Thinking exhibits stronger exploration potential compared to conventional COT. Our findings deepen the understanding of continuous reasoning and establish the foundation for future work on improving Soft Thinking with Reinforcement Learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes