CLFeb 5

Stop Rewarding Hallucinated Steps: Faithfulness-Aware Step-Level Reinforcement Learning for Small Reasoning Models

arXiv:2602.05897v11 citationsh-index: 10Has Code
Originality Incremental advance
AI Analysis

This addresses reliability issues in small reasoning models for resource-constrained settings, though it is an incremental improvement over existing reinforcement learning methods.

The paper tackles the problem of faithfulness hallucinations in small reasoning models during chain-of-thought reasoning by proposing FaithRL, which uses step-level supervision and implicit truncated resampling. Experiments show it reduces hallucinations in CoT and final answers across multiple benchmarks.

As large language models become smaller and more efficient, small reasoning models (SRMs) are crucial for enabling chain-of-thought (CoT) reasoning in resource-constrained settings. However, they are prone to faithfulness hallucinations, especially in intermediate reasoning steps. Existing mitigation methods based on online reinforcement learning rely on outcome-based rewards or coarse-grained CoT evaluation, which can inadvertently reinforce unfaithful reasoning when the final answer is correct. To address these limitations, we propose Faithfulness-Aware Step-Level Reinforcement Learning (FaithRL), introducing step-level supervision via explicit faithfulness rewards from a process reward model, together with an implicit truncated resampling strategy that generates contrastive signals from faithful prefixes. Experiments across multiple SRMs and Open-Book QA benchmarks demonstrate that FaithRL consistently reduces hallucinations in both the CoT and final answers, leading to more faithful and reliable reasoning. Code is available at https://github.com/Easy195/FaithRL.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes