How LLMs Fail to Support Fact-Checking
This highlights concerns for researchers and non-technical users about relying on prompt-engineered LLMs for fact-checking, indicating an incremental study on existing limitations.
The paper tackled the problem of using LLMs for fact-checking political misinformation, finding that models like ChatGPT, Gemini, and Claude struggle to ground responses in real news sources and show biases toward left-leaning sources.
While Large Language Models (LLMs) can amplify online misinformation, they also show promise in tackling misinformation. In this paper, we empirically study the capabilities of three LLMs -- ChatGPT, Gemini, and Claude -- in countering political misinformation. We implement a two-step, chain-of-thought prompting approach, where models first identify credible sources for a given claim and then generate persuasive responses. Our findings suggest that models struggle to ground their responses in real news sources, and tend to prefer citing left-leaning sources. We also observe varying degrees of response diversity among models. Our findings highlight concerns about using LLMs for fact-checking through only prompt-engineering, emphasizing the need for more robust guardrails. Our results have implications for both researchers and non-technical users.