CVAIAug 3, 2025

HateClipSeg: A Segment-Level Annotated Dataset for Fine-Grained Hate Video Detection

arXiv:2508.01712v24 citationsh-index: 5Has CodeMM
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of fine-grained hate video detection for researchers and practitioners by providing a new annotated dataset, though it is incremental as it builds on existing multimodal hate speech detection efforts.

The authors tackled the problem of detecting hate speech in videos by introducing HateClipSeg, a large-scale multimodal dataset with over 11,714 segments annotated at both video and segment levels, achieving high inter-annotator agreement (Krippendorff's alpha = 0.817). The results revealed significant performance gaps in current models, underscoring the need for advanced multimodal and temporally aware methods.

Detecting hate speech in videos remains challenging due to the complexity of multimodal content and the lack of fine-grained annotations in existing datasets. We present HateClipSeg, a large-scale multimodal dataset with both video-level and segment-level annotations, comprising over 11,714 segments labeled as Normal or across five Offensive categories: Hateful, Insulting, Sexual, Violence, Self-Harm, along with explicit target victim labels. Our three-stage annotation process yields high inter-annotator agreement (Krippendorff's alpha = 0.817). We propose three tasks to benchmark performance: (1) Trimmed Hateful Video Classification, (2) Temporal Hateful Video Localization, and (3) Online Hateful Video Classification. Results highlight substantial gaps in current models, emphasizing the need for more sophisticated multimodal and temporally aware approaches. The HateClipSeg dataset are publicly available at https://github.com/Social-AI-Studio/HateClipSeg.git.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes