CLMay 29

BOUTEF: A Multilingual Corpus for FakeNews in North Africa -- Language as a Weapon

arXiv:2606.0019354.8h-index: 11
Predicted impact top 96% in CL · last 90 daysOriginality Synthesis-oriented
AI Analysis

For researchers studying fake news in multilingual, low-resource contexts, this work provides a new corpus and empirical insights, but it is incremental as it applies known methods to a new dataset.

The paper introduces BOUTEF, a multilingual corpus for fake news in North Africa, and finds that fake news relies on emotionally charged narratives and hybrid linguistic practices, while debunking content uses factual styles. Statistical analyses show significant associations between thematic categories and veracity, and strong correlations between engagement and fake content visibility.

The rapid spread of fake news on social media has become a major challenge, particularly in multilingual and under-resourced contexts such as North Africa. In this paper, we introduce BOUTEF, a large-scale multilingual corpus designed to study the propagation, characteristics, and impact of fake news in Algeria and Tunisia. The corpus integrates three complementary components: fake narratives, genuine narratives, and associated user-generated comments, along with verified debunking information. It covers a wide range of languages and linguistic varieties, including MSA, Algerian and Tunisian dialects, Arabizi, French, English, and code-switched language. Building on this resource, we conduct a comprehensive empirical analysis combining quantitative and qualitative approaches. We examine thematic distributions, linguistic and rhetorical strategies, sentiment patterns, and social engagement dynamics. Statistical analyses reveal significant associations between thematic categories and message veracity, as well as strong correlations between user engagement and the visibility of fake content. Our findings show that fake news relies heavily on emotionally charged narratives, sensational framing, and hybrid linguistic practices that enhance virality and audience engagement. In contrast, debunking content adopts a more factual and verification-oriented style. Furthermore, a comparative analysis between Algeria and Tunisia highlights both shared dynamics and country-specific characteristics shaped by sociopolitical contexts. The results emphasize the role of informal language practices in the diffusion and reception of misinformation. By providing a rich, annotated, and publicly available dataset, this work contributes to advancing research on fake news detection, low-resource language processing, and the understanding of information disorders in complex linguistic environments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes