Demystifying Hybrid Thinking: Can LLMs Truly Switch Between Think and No-Think?
This addresses controllability issues in hybrid thinking LLMs for AI efficiency, but is incremental as it improves an existing approach.
The paper tackles the problem of hybrid thinking in LLMs, where reasoning behaviors leak into no-think modes, and proposes a training recipe that reduces no-think output length from 1085 to 585 tokens and reasoning-supportive tokens from 5917 to 522 on MATH500 while maintaining accuracy.
Hybrid thinking enables LLMs to switch between reasoning and direct answering, offering a balance between efficiency and reasoning capability. Yet our experiments reveal that current hybrid thinking LLMs only achieve partial mode separation: reasoning behaviors often leak into the no-think mode. To understand and mitigate this, we analyze the factors influencing controllability and identify four that matter most: (1) larger data scale, (2) using think and no-think answers from different questions rather than the same question, (3) a moderate increase in no-think data number, and (4) a two-phase strategy that first trains reasoning ability and then applies hybrid think training. Building on these findings, we propose a practical recipe that, compared to standard training, can maintain accuracy in both modes while significantly reducing no-think output length (from $1085$ to $585$ on MATH500) and occurrences of reasoning-supportive tokens such as ``\texttt{wait}'' (from $5917$ to $522$ on MATH500). Our findings highlight the limitations of current hybrid thinking and offer directions for strengthening its controllability.