MemeIntel: Explainable Detection of Propagandistic and Hateful Memes
This addresses the problem of moderating harmful multimodal content on social media for platforms and researchers, with incremental improvements in detection accuracy.
The paper tackled the challenge of detecting propagandistic and hateful memes by jointly modeling label detection and explanation generation, resulting in a new dataset and method that improved accuracy by ~1.4% on ArMeme and ~2.2% on Hateful Memes over the state-of-the-art.
The proliferation of multimodal content on social media presents significant challenges in understanding and moderating complex, context-dependent issues such as misinformation, hate speech, and propaganda. While efforts have been made to develop resources and propose new methods for automatic detection, limited attention has been given to jointly modeling label detection and the generation of explanation-based rationales, which often leads to degraded classification performance when trained simultaneously. To address this challenge, we introduce MemeXplain, an explanation-enhanced dataset for propagandistic memes in Arabic and hateful memes in English, making it the first large-scale resource for these tasks. To solve these tasks, we propose a multi-stage optimization approach and train Vision-Language Models (VLMs). Our results show that this strategy significantly improves both label detection and explanation generation quality over the base model, outperforming the current state-of-the-art with an absolute improvement of ~1.4% (Acc) on ArMeme and ~2.2% (Acc) on Hateful Memes. For reproducibility and future research, we aim to make the MemeXplain dataset and scripts publicly available (https://github.com/MohamedBayan/MemeIntel).