MMAIAug 28, 2025

MM-HSD: Multi-Modal Hate Speech Detection in Videos

arXiv:2508.20546v17 citationsh-index: 1Has CodeMM
Originality Incremental advance
AI Analysis

This work addresses hate speech detection in videos, a domain-specific problem with societal impact, but it is incremental as it builds on existing multi-modal methods by adding modalities and attention mechanisms.

The paper tackles the problem of detecting hate speech in videos by integrating multiple modalities, including video frames, audio, and text from transcripts and on-screen text, using Cross-Modal Attention for feature extraction, and achieves a state-of-the-art M-F1 score of 0.874 on the HateMM dataset.

While hate speech detection (HSD) has been extensively studied in text, existing multi-modal approaches remain limited, particularly in videos. As modalities are not always individually informative, simple fusion methods fail to fully capture inter-modal dependencies. Moreover, previous work often omits relevant modalities such as on-screen text and audio, which may contain subtle hateful content and thus provide essential cues, both individually and in combination with others. In this paper, we present MM-HSD, a multi-modal model for HSD in videos that integrates video frames, audio, and text derived from speech transcripts and from frames (i.e.~on-screen text) together with features extracted by Cross-Modal Attention (CMA). We are the first to use CMA as an early feature extractor for HSD in videos, to systematically compare query/key configurations, and to evaluate the interactions between different modalities in the CMA block. Our approach leads to improved performance when on-screen text is used as a query and the rest of the modalities serve as a key. Experiments on the HateMM dataset show that MM-HSD outperforms state-of-the-art methods on M-F1 score (0.874), using concatenation of transcript, audio, video, on-screen text, and CMA for feature extraction on raw embeddings of the modalities. The code is available at https://github.com/idiap/mm-hsd

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes