CLJan 21

Robust Fake News Detection using Large Language Models under Adversarial Sentiment Attacks

Sahar Tahmasebi, Eric Müller-Budack, Ralph Ewerth

arXiv:2601.15277v12.13 citationsh-index: 10

Originality Incremental advance

AI Analysis

This addresses the problem of adversarial sentiment attacks on fake news detection systems, which is an incremental improvement focusing on a specific vulnerability in misinformation detection.

The paper tackles the vulnerability of fake news detectors to sentiment manipulation by adversaries using large language models, showing that changing sentiment heavily impacts detection performance with biases toward neutral articles being classified as real. It introduces AdSent, a sentiment-robust detection framework that significantly outperforms baselines in accuracy and robustness across three benchmark datasets.

Misinformation and fake news have become a pressing societal challenge, driving the need for reliable automated detection methods. Prior research has highlighted sentiment as an important signal in fake news detection, either by analyzing which sentiments are associated with fake news or by using sentiment and emotion features for classification. However, this poses a vulnerability since adversaries can manipulate sentiment to evade detectors especially with the advent of large language models (LLMs). A few studies have explored adversarial samples generated by LLMs, but they mainly focus on stylistic features such as writing style of news publishers. Thus, the crucial vulnerability of sentiment manipulation remains largely unexplored. In this paper, we investigate the robustness of state-of-the-art fake news detectors under sentiment manipulation. We introduce AdSent, a sentiment-robust detection framework designed to ensure consistent veracity predictions across both original and sentiment-altered news articles. Specifically, we (1) propose controlled sentiment-based adversarial attacks using LLMs, (2) analyze the impact of sentiment shifts on detection performance. We show that changing the sentiment heavily impacts the performance of fake news detection models, indicating biases towards neutral articles being real, while non-neutral articles are often classified as fake content. (3) We introduce a novel sentiment-agnostic training strategy that enhances robustness against such perturbations. Extensive experiments on three benchmark datasets demonstrate that AdSent significantly outperforms competitive baselines in both accuracy and robustness, while also generalizing effectively to unseen datasets and adversarial scenarios.

View on arXiv PDF

Similar