CLMay 25, 2025

SCRum-9: Multilingual Stance Classification over Rumours on Social Media

Yue Li, Jake Vasilakes, Zhixue Zhao, Carolina Scarton

arXiv:2505.18916v32.71 citationsh-index: 23

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of analyzing misleading narratives across languages for social media researchers, though it is incremental as it builds on existing stance classification datasets.

The authors tackled the problem of multilingual stance classification for rumours on social media by introducing SCRum-9, a dataset covering 9 languages with 7,516 tweets, and benchmarked models showing that synthetic data from LLMs can boost MLM performance, with models often matching human second-choice labels in ambiguous cases.

We introduce SCRum-9, the largest multilingual Stance Classification dataset for Rumour analysis in 9 languages, containing 7,516 tweets from X. SCRum-9 goes beyond existing stance classification datasets by covering more languages, linking examples to more fact-checked claims (2.1k), and including confidence-related annotations from multiple annotators to account for intra- and inter-annotator variability. Annotations were made by at least two native speakers per language, totalling more than 405 hours of annotation and 8,150 dollars in compensation. Further, SCRum-9 is used to benchmark five large language models (LLMs) and two multilingual masked language models (MLMs) in In-Context Learning (ICL) and fine-tuning setups. This paper also innovates by exploring the use of multilingual synthetic data for rumour stance classification, showing that even LLMs with weak ICL performance can produce valuable synthetic data for fine-tuning small MLMs, enabling them to achieve higher performance than zero-shot ICL in LLMs. Finally, we examine the relationship between model predictions and human uncertainty on ambiguous cases finding that model predictions often match the second-choice labels assigned by annotators, rather than diverging entirely from human judgments. SCRum-9 is publicly released to the research community with potential to foster further research on multilingual analysis of misleading narratives on social media.

View on arXiv PDF

Similar