LLM-Based Adversarial Persuasion Attacks on Fact-Checking Systems

João A. Leite, Olesya Razuvayevskaya, Kalina Bontcheva, Carolina Scarton

arXiv:2601.16890v11.11 citationsh-index: 12

Originality Incremental advance

AI Analysis

This work addresses a critical vulnerability in fact-checking systems for combating disinformation, though it is incremental in exploring a new attack vector rather than a paradigm shift.

The paper tackled the susceptibility of automated fact-checking systems to adversarial attacks by introducing a novel class of persuasive attacks using LLMs to rephrase claims with persuasion techniques, resulting in substantial degradation of verification performance and evidence retrieval on benchmarks like FEVER and FEVEROUS.

Automated fact-checking (AFC) systems are susceptible to adversarial attacks, enabling false claims to evade detection. Existing adversarial frameworks typically rely on injecting noise or altering semantics, yet no existing framework exploits the adversarial potential of persuasion techniques, which are widely used in disinformation campaigns to manipulate audiences. In this paper, we introduce a novel class of persuasive adversarial attacks on AFCs by employing a generative LLM to rephrase claims using persuasion techniques. Considering 15 techniques grouped into 6 categories, we study the effects of persuasion on both claim verification and evidence retrieval using a decoupled evaluation strategy. Experiments on the FEVER and FEVEROUS benchmarks show that persuasion attacks can substantially degrade both verification performance and evidence retrieval. Our analysis identifies persuasion techniques as a potent class of adversarial attacks, highlighting the need for more robust AFC systems.

View on arXiv PDF

Similar