SIAIMMSep 8, 2025

A New Dataset and Benchmark for Grounding Multimodal Misinformation

arXiv:2509.08008v12 citationsh-index: 2MM
Originality Incremental advance
AI Analysis

This work addresses the societal risk of online misinformation videos by providing a new dataset and benchmark for explainable multimodal detection, though it is incremental as it builds on existing detection methods with a novel task focus.

The paper tackles the problem of detecting and localizing misleading segments in multimodal misinformation videos by introducing the GroundMM task and the GroundLie360 dataset, which includes fine-grained annotations and validation with Snopes evidence, and proposes the FakeMark baseline method for effective detection and grounding.

The proliferation of online misinformation videos poses serious societal risks. Current datasets and detection methods primarily target binary classification or single-modality localization based on post-processed data, lacking the interpretability needed to counter persuasive misinformation. In this paper, we introduce the task of Grounding Multimodal Misinformation (GroundMM), which verifies multimodal content and localizes misleading segments across modalities. We present the first real-world dataset for this task, GroundLie360, featuring a taxonomy of misinformation types, fine-grained annotations across text, speech, and visuals, and validation with Snopes evidence and annotator reasoning. We also propose a VLM-based, QA-driven baseline, FakeMark, using single- and cross-modal cues for effective detection and grounding. Our experiments highlight the challenges of this task and lay a foundation for explainable multimodal misinformation detection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes