ReThinker: Scientific Reasoning by Rethinking with Guided Reflection and Confidence Control
This addresses the problem of inefficient and brittle reasoning in AI systems for scientific tasks, representing an incremental improvement over existing methods.
The paper tackles the challenge of expert-level scientific reasoning for large language models by introducing ReThinker, a confidence-aware agentic framework that dynamically allocates computation based on model confidence, achieving state-of-the-art results on benchmarks like Humanity's Last Exam, GAIA, and XBench.
Expert-level scientific reasoning remains challenging for large language models, particularly on benchmarks such as Humanity's Last Exam (HLE), where rigid tool pipelines, brittle multi-agent coordination, and inefficient test-time scaling often limit performance. We introduce ReThinker, a confidence-aware agentic framework that orchestrates retrieval, tool use, and multi-agent reasoning through a stage-wise Solver-Critic-Selector architecture. Rather than following a fixed pipeline, ReThinker dynamically allocates computation based on model confidence, enabling adaptive tool invocation, guided multi-dimensional reflection, and robust confidence-weighted selection. To support scalable training without human annotation, we further propose a reverse data synthesis pipeline and an adaptive trajectory recycling strategy that transform successful reasoning traces into high-quality supervision. Experiments on HLE, GAIA, and XBench demonstrate that ReThinker consistently outperforms state-of-the-art foundation models with tools and existing deep research systems, achieving state-of-the-art results on expert-level reasoning tasks.