ToxicTAGS: Decoding Toxic Memes with Rich Tag Annotations
This addresses the challenge of content moderation for social media platforms by providing a scalable foundation to combat toxic discourse amplified through memes.
The authors tackled the problem of detecting toxic memes by introducing a new dataset of 6,300 real-world memes with binary and fine-grained annotations, enriched with socially relevant tags, and showed that incorporating these tags substantially enhances state-of-the-art VLM detection performance.
The 2025 Global Risks Report identifies state-based armed conflict and societal polarisation among the most pressing global threats, with social media playing a central role in amplifying toxic discourse. Memes, as a widely used mode of online communication, often serve as vehicles for spreading harmful content. However, limitations in data accessibility and the high cost of dataset curation hinder the development of robust meme moderation systems. To address this challenge, in this work, we introduce a first-of-its-kind dataset of 6,300 real-world meme-based posts annotated in two stages: (i) binary classification into toxic and normal, and (ii) fine-grained labelling of toxic memes as hateful, dangerous, or offensive. A key feature of this dataset is that it is enriched with auxiliary metadata of socially relevant tags, enhancing the context of each meme. In addition, we propose a tag generation module that produces socially grounded tags, because most in-the-wild memes often do not come with tags. Experimental results show that incorporating these tags substantially enhances the performance of state-of-the-art VLMs detection tasks. Our contributions offer a novel and scalable foundation for improved content moderation in multimodal online environments.