Beyond speculation: Measuring the growing presence of LLM-generated texts in multilingual disinformation
It addresses the debate on LLM misuse in disinformation by offering concrete data, which is crucial for policymakers and researchers concerned about multilingual misinformation risks.
The study provides the first empirical evidence of LLM-generated text in real-world disinformation datasets, showing an increase after ChatGPT's release and revealing patterns across languages, platforms, and time periods.
Increased sophistication of large language models (LLMs) and the consequent quality of generated multilingual text raises concerns about potential disinformation misuse. While humans struggle to distinguish LLM-generated content from human-written texts, the scholarly debate about their impact remains divided. Some argue that heightened fears are overblown due to natural ecosystem limitations, while others contend that specific "longtail" contexts face overlooked risks. Our study bridges this debate by providing the first empirical evidence of LLM presence in the latest real-world disinformation datasets, documenting the increase of machine-generated content following ChatGPT's release, and revealing crucial patterns across languages, platforms, and time periods.