AICLMar 23, 2025

Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilities

arXiv:2503.17979v222 citationsh-index: 28
Originality Synthesis-oriented
AI Analysis

This addresses the trade-offs in reasoning models for AI developers and researchers, highlighting an incremental improvement in optimizing model performance and efficiency.

The study found that large reasoning models (LRMs) with deliberative reasoning capabilities, such as OpenAI's o1/o3 and DeepSeek-R1, experience significant declines in foundational abilities like helpfulness and harmlessness, along with increased inference costs, but adaptive reasoning methods like Zero-Thinking can mitigate these issues.

Recent advancements in Large Reasoning Models (LRMs), such as OpenAI's o1/o3 and DeepSeek-R1, have demonstrated remarkable performance in specialized reasoning tasks through human-like deliberative thinking and long chain-of-thought reasoning. However, our systematic evaluation across various model families (DeepSeek, Qwen, and LLaMA) and scales (7B to 32B) reveals that acquiring these deliberative reasoning capabilities significantly reduces the foundational capabilities of LRMs, including notable declines in helpfulness and harmlessness, alongside substantially increased inference costs. Importantly, we demonstrate that adaptive reasoning -- employing modes like Zero-Thinking, Less-Thinking, and Summary-Thinking -- can effectively alleviate these drawbacks. Our empirical insights underline the critical need for developing more versatile LRMs capable of dynamically allocating inference-time compute according to specific task characteristics.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes