CL AIMar 29, 2025

Beyond speculation: Measuring the growing presence of LLM-generated texts in multilingual disinformation

Dominik Macko, Aashish Anantha Ramakrishnan, Jason Samuel Lucas, Robert Moro, Ivan Srba, Adaku Uchendu, Dongwon Lee

arXiv:2503.23242v18.33 citationsh-index: 19Computer

Originality Incremental advance

AI Analysis

It addresses the debate on LLM misuse in disinformation by offering concrete data, which is crucial for policymakers and researchers concerned about multilingual misinformation risks.

The study provides the first empirical evidence of LLM-generated text in real-world disinformation datasets, showing an increase after ChatGPT's release and revealing patterns across languages, platforms, and time periods.

Increased sophistication of large language models (LLMs) and the consequent quality of generated multilingual text raises concerns about potential disinformation misuse. While humans struggle to distinguish LLM-generated content from human-written texts, the scholarly debate about their impact remains divided. Some argue that heightened fears are overblown due to natural ecosystem limitations, while others contend that specific "longtail" contexts face overlooked risks. Our study bridges this debate by providing the first empirical evidence of LLM presence in the latest real-world disinformation datasets, documenting the increase of machine-generated content following ChatGPT's release, and revealing crucial patterns across languages, platforms, and time periods.

View on arXiv PDF

Similar