CLJun 26, 2025

(Fact) Check Your Bias

Eivind Morris Bakke, Nora Winger Heggelund

arXiv:2506.21745v11 citationsHas CodeProceedings of the Eighth Fact Extraction and VERification Workshop (FEVER)

Originality Incremental advance

AI Analysis

This work addresses bias issues in automated fact verification systems, which is incremental as it builds on existing models like HerO and FEVER-25.

The study investigated how parametric knowledge biases in large language models affect fact-checking outcomes, finding that Llama 3.1 labeled nearly half of claims as 'Not Enough Evidence' and that injected biases influenced evidence retrieval but not final verdicts.

Automatic fact verification systems increasingly rely on large language models (LLMs). We investigate how parametric knowledge biases in these models affect fact-checking outcomes of the HerO system (baseline for FEVER-25). We examine how the system is affected by: (1) potential bias in Llama 3.1's parametric knowledge and (2) intentionally injected bias. When prompted directly to perform fact-verification, Llama 3.1 labels nearly half the claims as "Not Enough Evidence". Using only its parametric knowledge it is able to reach a verdict on the remaining half of the claims. In the second experiment, we prompt the model to generate supporting, refuting, or neutral fact-checking documents. These prompts significantly influence retrieval outcomes, with approximately 50\% of retrieved evidence being unique to each perspective. Notably, the model sometimes refuses to generate supporting documents for claims it believes to be false, creating an inherent negative bias. Despite differences in retrieved evidence, final verdict predictions show stability across prompting strategies. The code is available at: https://github.com/eibakke/FEVER-8-Shared-Task

View on arXiv PDF Code

Similar