AICLCVCYLGOct 20, 2025

MIRAGE: Agentic Framework for Multimodal Misinformation Detection with Web-Grounded Reasoning

arXiv:2510.17590v12 citationsh-index: 4
Originality Highly original
AI Analysis

This addresses the problem of scalable misinformation detection across text and images for web platforms, offering a domain-agnostic solution that matches supervised performance without labeled data.

The paper tackles multimodal misinformation detection by introducing MIRAGE, an agentic framework that decomposes verification into sequential modules including visual assessment, cross-modal analysis, web-grounded factual checking, and calibrated judgment. On the MMFakeBench validation set, MIRAGE with GPT-4o-mini achieved 81.65% F1 and 75.1% accuracy, outperforming the strongest zero-shot baseline by 7.65 F1 points while reducing false positive rate from 97.3% to 34.3%.

Misinformation spreads across web platforms through billions of daily multimodal posts that combine text and images, overwhelming manual fact-checking capacity. Supervised detection models require domain-specific training data and fail to generalize across diverse manipulation tactics. We present MIRAGE, an inference-time, model-pluggable agentic framework that decomposes multimodal verification into four sequential modules: visual veracity assessment detects AI-generated images, cross-modal consistency analysis identifies out-of-context repurposing, retrieval-augmented factual checking grounds claims in web evidence through iterative question generation, and a calibrated judgment module integrates all signals. MIRAGE orchestrates vision-language model reasoning with targeted web retrieval, outputs structured and citation-linked rationales. On MMFakeBench validation set (1,000 samples), MIRAGE with GPT-4o-mini achieves 81.65% F1 and 75.1% accuracy, outperforming the strongest zero-shot baseline (GPT-4V with MMD-Agent at 74.0% F1) by 7.65 points while maintaining 34.3% false positive rate versus 97.3% for a judge-only baseline. Test set results (5,000 samples) confirm generalization with 81.44% F1 and 75.08% accuracy. Ablation studies show visual verification contributes 5.18 F1 points and retrieval-augmented reasoning contributes 2.97 points. Our results demonstrate that decomposed agentic reasoning with web retrieval can match supervised detector performance without domain-specific training, enabling misinformation detection across modalities where labeled data remains scarce.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes