LGAIJul 22, 2025

METER: Multi-modal Evidence-based Thinking and Explainable Reasoning -- Algorithm and Benchmark

arXiv:2507.16206v11 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses the risk of misinformation from realistic generative media by providing a standardized foundation for interpretable forgery detection, though it is incremental as it builds on existing detection approaches with enhanced benchmarks and training strategies.

The paper tackles the problem of detecting synthetic content across multiple modalities by introducing METER, a unified benchmark that requires not only classification but also evidence-based explanations, achieving broader modality coverage and richer interpretability metrics compared to prior benchmarks.

With the rapid advancement of generative AI, synthetic content across images, videos, and audio has become increasingly realistic, amplifying the risk of misinformation. Existing detection approaches predominantly focus on binary classification while lacking detailed and interpretable explanations of forgeries, which limits their applicability in safety-critical scenarios. Moreover, current methods often treat each modality separately, without a unified benchmark for cross-modal forgery detection and interpretation. To address these challenges, we introduce METER, a unified, multi-modal benchmark for interpretable forgery detection spanning images, videos, audio, and audio-visual content. Our dataset comprises four tracks, each requiring not only real-vs-fake classification but also evidence-chain-based explanations, including spatio-temporal localization, textual rationales, and forgery type tracing. Compared to prior benchmarks, METER offers broader modality coverage and richer interpretability metrics such as spatial/temporal IoU, multi-class tracing, and evidence consistency. We further propose a human-aligned, three-stage Chain-of-Thought (CoT) training strategy combining SFT, DPO, and a novel GRPO stage that integrates a human-aligned evaluator with CoT reasoning. We hope METER will serve as a standardized foundation for advancing generalizable and interpretable forgery detection in the era of generative media.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes