FactGuard: Agentic Video Misinformation Detection via Reinforcement Learning
This work is significant for improving the accuracy and robustness of video misinformation detection for general users and content moderation platforms, especially in challenging evidence scenarios.
This paper addresses the limitations of MLLMs in video misinformation detection, particularly when evidence is sparse, by proposing FactGuard. FactGuard is an agentic framework that uses iterative reasoning and external tools to refine its verification process, achieving state-of-the-art performance on FakeSV, FakeTT, and FakeVV datasets.
Multimodal large language models (MLLMs) have substantially advanced video misinformation detection through unified multimodal reasoning, but they often rely on fixed-depth inference and place excessive trust in internally generated assumptions, particularly in scenarios where critical evidence is sparse, fragmented, or requires external verification. To address these limitations, we propose FactGuard, an agentic framework for video misinformation detection that formulates verification as an iterative reasoning process built upon MLLMs. FactGuard explicitly assesses task ambiguity and selectively invokes external tools to acquire critical evidence, enabling progressive refinement of reasoning trajectories. To further strengthen this capability, we introduce a two-stage training strategy that combines domain-specific agentic supervised fine-tuning with decision-aware reinforcement learning to optimize tool usage and calibrate risk-sensitive decision making. Extensive experiments on FakeSV, FakeTT, and FakeVV demonstrate FactGuard's state-of-the-art performance and validate its excellent robustness and generalization capacity.