AIJun 3

SCI-PRM: A Tool Aware Process Reward Model for Scientific Reasoning Verification

Xiangyu Zhao, Hengyuan Zhao, Yiheng Wang, Wanghan Xu, Yuhao Zhou, Qinglong Cao, Zhiwang Zhou, Lei Bai, Wenlong Zhang, Xiao-Ming Wu

arXiv:2606.0457979.0

AI Analysis

For scientific reasoning tasks in biology, chemistry, and physics, this work addresses the lack of verification in tool usage and reasoning, providing a method to enhance foundation models' performance.

The paper introduces Sci-PRM, a process reward model for scientific reasoning that verifies tool usage and factual consistency. It achieves significant improvements in test-time scaling and reinforcement learning, enabling models to surpass previous performance ceilings.

While Process Reward Models (PRMs) have achieved remarkable success in mathematical reasoning, their application in complex scientific domains-such as biology, chemistry, and physics remains largely unexplored. Scientific problems demand not only logical rigor but also factual consistency and the precise usage of domain-specific tools, areas where current models often suffer from hallucinations and lack of verification. In this paper, we first construct SCIPRM70K, a large-scale dataset featuring Chain-of-Tool trajectories that explicitly interleave reasoning with the execution of scientific tools. Building upon this, we train an efficient reward model called Sci-PRM to provide fine-grained supervision on tool selection, execution accuracy, and result interpretation at each step in one inference. Experiments demonstrate that Sci-PRM significantly enhances foundation models in two key aspects: (1) it enables effective test-time scaling via Best-of-N selection; and (2) when integrated into Reinforcement Learning, it serves as a dense reward signal that mitigates the critical issue of advantage disappearance, allowing the model to break through existing performance ceilings.

View on arXiv PDF

Similar