CLFeb 5

Stop Rewarding Hallucinated Steps: Faithfulness-Aware Step-Level Reinforcement Learning for Small Reasoning Models

Shuo Nie, Hexuan Deng, Chao Wang, Ruiyu Fang, Xuebo Liu, Shuangyong Song, Yu Li, Min Zhang, Xuelong Li

arXiv:2602.05897v11.12 citationsh-index: 10Has Code

Originality Incremental advance

AI Analysis

This addresses reliability issues in small reasoning models for resource-constrained settings, though it is an incremental improvement over existing reinforcement learning methods.

The paper tackles the problem of faithfulness hallucinations in small reasoning models during chain-of-thought reasoning by proposing FaithRL, which uses step-level supervision and implicit truncated resampling. Experiments show it reduces hallucinations in CoT and final answers across multiple benchmarks.

As large language models become smaller and more efficient, small reasoning models (SRMs) are crucial for enabling chain-of-thought (CoT) reasoning in resource-constrained settings. However, they are prone to faithfulness hallucinations, especially in intermediate reasoning steps. Existing mitigation methods based on online reinforcement learning rely on outcome-based rewards or coarse-grained CoT evaluation, which can inadvertently reinforce unfaithful reasoning when the final answer is correct. To address these limitations, we propose Faithfulness-Aware Step-Level Reinforcement Learning (FaithRL), introducing step-level supervision via explicit faithfulness rewards from a process reward model, together with an implicit truncated resampling strategy that generates contrastive signals from faithful prefixes. Experiments across multiple SRMs and Open-Book QA benchmarks demonstrate that FaithRL consistently reduces hallucinations in both the CoT and final answers, leading to more faithful and reliable reasoning. Code is available at https://github.com/Easy195/FaithRL.

View on arXiv PDF Code

Similar