CLMay 21, 2025

An Empirical Study of the Anchoring Effect in LLMs: Existence, Mechanism, and Potential Mitigations

arXiv:2505.15392v15 citationsh-index: 4
Originality Synthesis-oriented
AI Analysis

This work addresses cognitive bias concerns in LLMs for AI safety and evaluation, but it is incremental as it builds on known psychological concepts.

The study investigated the anchoring effect in Large Language Models (LLMs), finding that this cognitive bias commonly exists with shallow-layer mechanisms and is not eliminated by conventional strategies, though reasoning offers some mitigation.

The rise of Large Language Models (LLMs) like ChatGPT has advanced natural language processing, yet concerns about cognitive biases are growing. In this paper, we investigate the anchoring effect, a cognitive bias where the mind relies heavily on the first information as anchors to make affected judgments. We explore whether LLMs are affected by anchoring, the underlying mechanisms, and potential mitigation strategies. To facilitate studies at scale on the anchoring effect, we introduce a new dataset, SynAnchors. Combining refined evaluation metrics, we benchmark current widely used LLMs. Our findings show that LLMs' anchoring bias exists commonly with shallow-layer acting and is not eliminated by conventional strategies, while reasoning can offer some mitigation. This recontextualization via cognitive psychology urges that LLM evaluations focus not on standard benchmarks or over-optimized robustness tests, but on cognitive-bias-aware trustworthy evaluation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes