Caption Enriched Samples for Improving Hateful Memes Detection
This work addresses the problem of hateful meme detection for social media moderation, presenting an incremental improvement over existing methods.
The paper tackled the challenge of detecting hateful memes by using off-the-shelf image captioning to capture image content, which improved results for various models, with unimodal language models showing significant accuracy gains through continued pre-training on caption pairs.
The recently introduced hateful meme challenge demonstrates the difficulty of determining whether a meme is hateful or not. Specifically, both unimodal language models and multimodal vision-language models cannot reach the human level of performance. Motivated by the need to model the contrast between the image content and the overlayed text, we suggest applying an off-the-shelf image captioning tool in order to capture the first. We demonstrate that the incorporation of such automatic captions during fine-tuning improves the results for various unimodal and multimodal models. Moreover, in the unimodal case, continuing the pre-training of language models on augmented and original caption pairs, is highly beneficial to the classification accuracy.