Are Large Language Models Good at Detecting Propaganda?
This work addresses the problem of automated propaganda detection for media analysis and misinformation mitigation, but it is incremental as it compares existing models without introducing new methods.
The study evaluated large language models (LLMs) for detecting propaganda techniques in news articles, finding that GPT-4 had an F1 score of 0.16, which was superior to other LLMs but did not outperform a RoBERTa-CRF baseline with an F1 of 0.67, though some LLMs outperformed a MultiGranularity Network baseline on specific techniques like name-calling.
Propagandists use rhetorical devices that rely on logical fallacies and emotional appeals to advance their agendas. Recognizing these techniques is key to making informed decisions. Recent advances in Natural Language Processing (NLP) have enabled the development of systems capable of detecting manipulative content. In this study, we look at several Large Language Models and their performance in detecting propaganda techniques in news articles. We compare the performance of these LLMs with transformer-based models. We find that, while GPT-4 demonstrates superior F1 scores (F1=0.16) compared to GPT-3.5 and Claude 3 Opus, it does not outperform a RoBERTa-CRF baseline (F1=0.67). Additionally, we find that all three LLMs outperform a MultiGranularity Network (MGN) baseline in detecting instances of one out of six propaganda techniques (name-calling), with GPT-3.5 and GPT-4 also outperforming the MGN baseline in detecting instances of appeal to fear and flag-waving.