CLJan 7

Whose Facts Win? LLM Source Preferences under Knowledge Conflicts

Jakob Schuster, Vagrant Gautam, Katja Markert

arXiv:2601.03746v11.12 citationsh-index: 7

Originality Incremental advance

AI Analysis

This addresses the problem of inconsistent source credibility in retrieval-augmented generation for NLP applications, though it is incremental in improving existing frameworks.

The study investigated how large language models (LLMs) resolve knowledge conflicts based on source preferences, finding that they prefer institutionally-corroborated sources over social media, but this preference can be reversed by repeating less credible information. The authors proposed a method to reduce repetition bias by up to 99.8% while maintaining at least 88.8% of original preferences.

As large language models (LLMs) are more frequently used in retrieval-augmented generation pipelines, it is increasingly relevant to study their behavior under knowledge conflicts. Thus far, the role of the source of the retrieved information has gone unexamined. We address this gap with a novel framework to investigate how source preferences affect LLM resolution of inter-context knowledge conflicts in English, motivated by interdisciplinary research on credibility. With a comprehensive, tightly-controlled evaluation of 13 open-weight LLMs, we find that LLMs prefer institutionally-corroborated information (e.g., government or newspaper sources) over information from people and social media. However, these source preferences can be reversed by simply repeating information from less credible sources. To mitigate repetition effects and maintain consistent preferences, we propose a novel method that reduces repetition bias by up to 99.8%, while also maintaining at least 88.8% of original preferences. We release all data and code to encourage future work on credibility and source preferences in knowledge-intensive NLP.

View on arXiv PDF

Similar