LGCLAug 9, 2021

An Interpretable Approach to Hateful Meme Detection

arXiv:2108.10069v117 citations
Originality Incremental advance
AI Analysis

This addresses the problem of detecting hateful content online for social media platforms, but it is incremental as it matches rather than surpasses existing methods.

The paper tackled hateful meme detection by using an interpretable approach with machine learning and heuristics, achieving 73.8 validation and 72.7 test auROC, comparable to human and state-of-the-art models.

Hateful memes are an emerging method of spreading hate on the internet, relying on both images and text to convey a hateful message. We take an interpretable approach to hateful meme detection, using machine learning and simple heuristics to identify the features most important to classifying a meme as hateful. In the process, we build a gradient-boosted decision tree and an LSTM-based model that achieve comparable performance (73.8 validation and 72.7 test auROC) to the gold standard of humans and state-of-the-art transformer models on this challenging task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes