CLAug 11, 2025

Echoes of Agreement: Argument Driven Opinion Shifts in Large Language Models

arXiv:2508.09759v1h-index: 4

Originality Incremental advance

AI Analysis

This addresses the robustness of bias evaluations for researchers and developers, revealing sycophantic tendencies in LLMs that could affect mitigation strategies.

The study tackled the problem of how large language models' political bias evaluations are sensitive to prompts containing arguments, finding that supporting or refuting arguments substantially shift model responses towards the argument's direction, with argument strength influencing agreement rates.

There have been numerous studies evaluating bias of LLMs towards political topics. However, how positions towards these topics in model outputs are highly sensitive to the prompt. What happens when the prompt itself is suggestive of certain arguments towards those positions remains underexplored. This is crucial for understanding how robust these bias evaluations are and for understanding model behaviour, as these models frequently interact with opinionated text. To that end, we conduct experiments for political bias evaluation in presence of supporting and refuting arguments. Our experiments show that such arguments substantially alter model responses towards the direction of the provided argument in both single-turn and multi-turn settings. Moreover, we find that the strength of these arguments influences the directional agreement rate of model responses. These effects point to a sycophantic tendency in LLMs adapting their stance to align with the presented arguments which has downstream implications for measuring political bias and developing effective mitigation strategies.

View on arXiv PDF

Similar