Multi-modal Identification of State-Sponsored Propaganda on Social Media
This work provides a new dataset and a general framework for researchers and practitioners working on the critical problem of identifying state-sponsored propaganda on social media, offering a strong benchmark for future research.
This paper addresses the challenge of identifying state-sponsored propaganda on social media by constructing the first balanced dataset for this task, encompassing propaganda from three organizations across two time periods. The authors propose a multi-modal framework that leverages visual and textual content, achieving an F1-score of 0.869 for same-period detection and 0.697 for cross-period detection.
The prevalence of state-sponsored propaganda on the Internet has become a cause for concern in the recent years. While much effort has been made to identify state-sponsored Internet propaganda, the problem remains far from being solved because the ambiguous definition of propaganda leads to unreliable data labelling, and the huge amount of potential predictive features causes the models to be inexplicable. This paper is the first attempt to build a balanced dataset for this task. The dataset is comprised of propaganda by three different organizations across two time periods. A multi-model framework for detecting propaganda messages solely based on the visual and textual content is proposed which achieves a promising performance on detecting propaganda by the three organizations both for the same time period (training and testing on data from the same time period) (F1=0.869) and for different time periods (training on past, testing on future) (F1=0.697). To reduce the influence of false positive predictions, we change the threshold to test the relationship between the false positive and true positive rates and provide explanations for the predictions made by our models with visualization tools to enhance the interpretability of our framework. Our new dataset and general framework provide a strong benchmark for the task of identifying state-sponsored Internet propaganda and point out a potential path for future work on this task.