CLMar 1

Can Thinking Models Think to Detect Hateful Memes?

arXiv:2603.01225v11 citationsh-index: 32
Originality Incremental advance
AI Analysis

This work addresses the challenge of hateful meme detection for online content moderation, representing an incremental improvement through a novel training objective and dataset extension.

The paper tackled the problem of detecting hateful memes, which require multimodal reasoning, by proposing a reinforcement learning post-training framework that improves reasoning in thinking-based multimodal large language models, achieving state-of-the-art performance with approximately 1% accuracy and F1 improvements and 3% explanation quality gains on the Hateful Memes benchmark.

Hateful memes often require compositional multimodal reasoning: the image and text may appear benign in isolation, yet their interaction conveys harmful intent. Although thinking-based multimodal large language models (MLLMs) have recently advanced vision-language understanding, their capabilities remain underexplored for hateful meme analysis. We propose a reinforcement learning based post-training framework that improves reasoning in thinking-based MLLMs through task-specific rewards and a novel Group Relative Policy Optimization (GRPO) objective. Specifically, we (i) conduct a systematic empirical study of off-the-shelf MLLMs for hateful meme understanding, (ii) extend an existing hateful meme dataset by generating weakly or pseudo-supervised chain-of-thought rationales via distillation, and (iii) introduce a GRPO-based objective that jointly optimizes meme classification and explanation quality to encourage fine-grained, step-by-step reasoning. Experiments on the Hateful Memes benchmark show that our approach achieves state-of-the-art performance, improving accuracy and F1 by approximately 1 percent and explanation quality by approximately 3 percent. We will publicly release our code, dataset extensions, and evaluation resources to support reproducibility.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes