CLFeb 2

D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use

arXiv:2602.02160v1h-index: 2Has Code
Originality Highly original
AI Analysis

It addresses the issue of sub-task decomposition in complex tool use for large reasoning models, leading to significant performance gains in benchmarks.

The paper tackles the problem of lazy reasoning in large reasoning models by proposing D-CORE, a two-stage training framework that incentivizes task decomposition and reflective reasoning, achieving state-of-the-art accuracy of 79.3% on BFCLv3 with a 14B model that outperforms larger 70B models.

Effective tool use and reasoning are essential capabilities for large reasoning models~(LRMs) to address complex real-world problems. Through empirical analysis, we identify that current LRMs lack the capability of sub-task decomposition in complex tool use scenarios, leading to Lazy Reasoning. To address this, we propose a two-stage training framework D-CORE~(\underline{\textbf{D}}ecomposing tasks and \underline{\textbf{Co}}mposing \underline{\textbf{Re}}asoning processes) that first incentivize the LRMs' task decomposition reasoning capability via self-distillation, followed by diversity-aware reinforcement learning~(RL) to restore LRMs' reflective reasoning capability. D-CORE achieves robust tool-use improvements across diverse benchmarks and model scales. Experiments on BFCLv3 demonstrate superiority of our method: D-CORE-8B reaches 77.7\% accuracy, surpassing the best-performing 8B model by 5.7\%. Meanwhile, D-CORE-14B establishes a new state-of-the-art at 79.3\%, outperforming 70B models despite being 5$\times$ smaller. The source code is available at https://github.com/alibaba/EfficientAI.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes