Detecting Offensive Memes with Social Biases in Singapore Context Using Multimodal Large Language Models
This work addresses content moderation challenges for online platforms in Singapore, where local context and low-resource languages make meme classification difficult, though it is incremental as it applies existing methods to a new dataset.
The researchers tackled the problem of detecting offensive memes in Singapore's culturally diverse context by fine-tuning a vision-language model on a dataset of 112K memes labeled by GPT-4V, achieving 80.62% accuracy and 0.8192 AUROC on a test set.
Traditional online content moderation systems struggle to classify modern multimodal means of communication, such as memes, a highly nuanced and information-dense medium. This task is especially hard in a culturally diverse society like Singapore, where low-resource languages are used and extensive knowledge on local context is needed to interpret online content. We curate a large collection of 112K memes labeled by GPT-4V for fine-tuning a VLM to classify offensive memes in Singapore context. We show the effectiveness of fine-tuned VLMs on our dataset, and propose a pipeline containing OCR, translation and a 7-billion parameter-class VLM. Our solutions reach 80.62% accuracy and 0.8192 AUROC on a held-out test set, and can greatly aid human in moderating online contents. The dataset, code, and model weights have been open-sourced at https://github.com/aliencaocao/vlm-for-memes-aisg.