CLLGJun 18, 2024

EUvsDisinfo: A Dataset for Multilingual Detection of Pro-Kremlin Disinformation in News Articles

arXiv:2406.12614v417 citations
Originality Synthesis-oriented
AI Analysis

This provides a resource for researchers and practitioners to combat disinformation, though it is incremental as it builds on existing datasets.

The authors introduced EUvsDisinfo, a multilingual dataset for detecting pro-Kremlin disinformation in news articles, and used it to analyze language-specific patterns and topic evolution, noting a significant surge before the 2022 Ukraine invasion.

This work introduces EUvsDisinfo, a multilingual dataset of disinformation articles originating from pro-Kremlin outlets, along with trustworthy articles from credible / less biased sources. It is sourced directly from the debunk articles written by experts leading the EUvsDisinfo project. Our dataset is the largest to-date resource in terms of the overall number of articles and distinct languages. It also provides the largest topical and temporal coverage. Using this dataset, we investigate the dissemination of pro-Kremlin disinformation across different languages, uncovering language-specific patterns targeting certain disinformation topics. We further analyse the evolution of topic distribution over an eight-year period, noting a significant surge in disinformation content before the full-scale invasion of Ukraine in 2022. Lastly, we demonstrate the dataset's applicability in training models to effectively distinguish between disinformation and trustworthy content in multilingual settings.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes