AISep 20, 2025

Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories

Mohammad Beigi, Ying Shen, Parshin Shojaee, Qifan Wang, Zichao Wang, Chandan Reddy, Ming Jin, Lifu Huang

arXiv:2509.16742v112.47 citationsh-index: 10EMNLP

Originality Incremental advance

AI Analysis

This addresses the issue of sycophancy for users of AI assistants, aiming to build more truthful and aligned models, though it appears incremental as it builds on existing reinforcement learning and reasoning methods.

The paper tackles the problem of sycophancy in large language models, where models agree with incorrect user information, by introducing SMART, a framework that reframes it as a reasoning optimization problem and uses uncertainty-aware adaptive reasoning with reinforcement learning, resulting in significant reduction in sycophantic behavior while preserving performance on out-of-distribution inputs and general capabilities.

Despite the remarkable capabilities of large language models, current training paradigms inadvertently foster \textit{sycophancy}, i.e., the tendency of a model to agree with or reinforce user-provided information even when it's factually incorrect. To address this challenge, we introduce \textbf{SMART} (Sycophancy Mitigation through Adaptive Reasoning Trajectories), which reframes sycophancy as a \textit{reasoning optimization problem} rather than an output alignment issue. SMART is a two-stage framework comprising: (1) Uncertainty-Aware Adaptive Monte Carlo Tree Search (UA-MCTS), which dynamically adjusts model exploration based on state-level uncertainty to collect high-quality, diverse reasoning trajectories alongside both stepwise progress and final outcome rewards; and (2) progress-based reinforcement learning, which fine-tunes the model using the collected trajectories and reward signals to reinforce effective reasoning patterns. Through extensive experiments, we show that SMART significantly reduces sycophantic behavior while preserving strong performance on out-of-distribution inputs and maintaining general capabilities. These results underscore the importance of optimizing internal reasoning mechanisms to build more truthful and aligned AI assistants.

View on arXiv PDF

Similar