LGAICLMar 31, 2025

Effectively Controlling Reasoning Models through Thinking Intervention

Princeton
arXiv:2503.24370v360 citationsh-index: 19Has Code
Originality Highly original
AI Analysis

This provides a new method for fine-grained control over reasoning LLMs, addressing challenges in instruction following, hierarchy reasoning, and safety alignment.

The paper tackles the problem of controlling reasoning-enhanced large language models by proposing Thinking Intervention, a paradigm that guides internal reasoning processes through strategic token manipulation, achieving up to 40.0% increase in refusal rates for unsafe prompts and 6.7-15.4% accuracy gains across various tasks.

Reasoning-enhanced large language models (LLMs) explicitly generate intermediate reasoning steps prior to generating final answers, helping the model excel in complex problem-solving. In this paper, we demonstrate that this emerging generation framework offers a unique opportunity for more fine-grained control over model behavior. We propose Thinking Intervention, a novel paradigm designed to explicitly guide the internal reasoning processes of LLMs by strategically inserting or revising specific thinking tokens. We find that the Thinking Intervention paradigm enhances the capabilities of reasoning models across a wide range of tasks, including instruction following on IFEval and Overthinking, instruction hierarchy on SEP, and safety alignment on XSTest and SorryBench. Our results demonstrate that Thinking Intervention significantly outperforms baseline prompting approaches, achieving up to 6.7% accuracy gains in instruction-following scenarios, 15.4% improvements in reasoning about instruction hierarchies, and a 40.0% increase in refusal rates for unsafe prompts using open-source DeepSeek R1 models. Overall, our work opens a promising new research avenue for controlling reasoning LLMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes