CL AIFeb 28, 2025

How LLMs Fail to Support Fact-Checking

Adiba Mahbub Proma, Neeley Pate, James Druckman, Gourab Ghoshal, Hangfeng He, Ehsan Hoque

arXiv:2503.01902v24 citationsh-index: 24

Originality Synthesis-oriented

AI Analysis

This highlights concerns for researchers and non-technical users about relying on prompt-engineered LLMs for fact-checking, indicating an incremental study on existing limitations.

The paper tackled the problem of using LLMs for fact-checking political misinformation, finding that models like ChatGPT, Gemini, and Claude struggle to ground responses in real news sources and show biases toward left-leaning sources.

While Large Language Models (LLMs) can amplify online misinformation, they also show promise in tackling misinformation. In this paper, we empirically study the capabilities of three LLMs -- ChatGPT, Gemini, and Claude -- in countering political misinformation. We implement a two-step, chain-of-thought prompting approach, where models first identify credible sources for a given claim and then generate persuasive responses. Our findings suggest that models struggle to ground their responses in real news sources, and tend to prefer citing left-leaning sources. We also observe varying degrees of response diversity among models. Our findings highlight concerns about using LLMs for fact-checking through only prompt-engineering, emphasizing the need for more robust guardrails. Our results have implications for both researchers and non-technical users.

View on arXiv PDF

Similar