Agentic-R1: Distilled Dual-Strategy Reasoning
This addresses the need for more robust and efficient reasoning in AI systems, particularly for mathematical and logical tasks, though it appears incremental as it builds on existing teacher-student distillation and tool-augmented methods.
The paper tackles the problem of slow and error-prone reasoning in long chain-of-thought models by introducing a fine-tuning framework, DualDistill, that distills complementary reasoning strategies into a unified model, Agentic-R1, which dynamically selects optimal strategies for different queries, improving accuracy across various tasks.
Current long chain-of-thought (long-CoT) models excel at mathematical reasoning but rely on slow and error-prone natural language traces. Tool-augmented agents address arithmetic via code execution, but often falter on complex logical tasks. We introduce a fine-tuning framework, DualDistill, that distills complementary reasoning strategies from multiple teachers into a unified student model. Using this approach, we train Agentic-R1, which dynamically selects the optimal strategy for each query, invoking tools for arithmetic and algorithmic problems, and using text-based reasoning for abstract ones. Our method improves accuracy across a range of tasks, including both computation-intensive and standard benchmarks, demonstrating the effectiveness of multi-strategy distillation in achieving robust and efficient reasoning. Our project is available at https://github.com/StigLidu/DualDistill