AI CLJan 20

Finding RELIEF: Shaping Reasoning Behavior without Reasoning Supervision via Belief Engineering

Chak Tou Leong, Dingwei Chen, Heming Xia, Qingyu Yin, Sunbowen Lee, Jian Wang, Wenjie Li

arXiv:2601.13752v12.4h-index: 10

Originality Incremental advance

AI Analysis

This addresses the issue of expensive and unscalable supervision for shaping reasoning behavior in AI models, though it is incremental as it builds on existing fine-tuning and probing techniques.

The paper tackles the problem of computational redundancy and reasoning unfaithfulness in large reasoning models by proposing RELIEF, a framework that shapes model behavior without reasoning supervision, achieving performance matching or exceeding baselines with lower training costs.

Large reasoning models (LRMs) have achieved remarkable success in complex problem-solving, yet they often suffer from computational redundancy or reasoning unfaithfulness. Current methods for shaping LRM behavior typically rely on reinforcement learning or fine-tuning with gold-standard reasoning traces, a paradigm that is both computationally expensive and difficult to scale. In this paper, we reveal that LRMs possess latent \textit{reasoning beliefs} that internally track their own reasoning traits, which can be captured through simple logit probing. Building upon this insight, we propose Reasoning Belief Engineering (RELIEF), a simple yet effective framework that shapes LRM behavior by aligning the model's self-concept with a target belief blueprint. Crucially, RELIEF completely bypasses the need for reasoning-trace supervision. It internalizes desired traits by fine-tuning on synthesized, self-reflective question-answering pairs that affirm the target belief. Extensive experiments on efficiency and faithfulness tasks demonstrate that RELIEF matches or outperforms behavior-supervised and preference-based baselines while requiring lower training costs. Further analysis validates that shifting a model's reasoning belief effectively shapes its actual behavior.

View on arXiv PDF

Similar