CV AI IRJul 6, 2025

Multimedia Verification Through Multi-Agent Deep Research Multimodal Large Language Models

Huy Hoan Le, Van Sy Thinh Nguyen, Thi Le Chi Dang, Vo Thanh Khang Nguyen, Truong Thanh Hung Nguyen, Hung Cao

arXiv:2507.04410v18.41 citationsh-index: 5MM

Originality Synthesis-oriented

AI Analysis

This addresses multimedia verification for misinformation detection, but it appears incremental as it builds on existing MLLMs and tools for a specific challenge.

The paper tackled the problem of detecting multimedia misinformation by developing a multi-agent verification system that combines Multimodal Large Language Models with specialized tools, and it successfully verified content authenticity, extracted geolocation and timing information, and traced source attribution on a challenge dataset sample.

This paper presents our submission to the ACMMM25 - Grand Challenge on Multimedia Verification. We developed a multi-agent verification system that combines Multimodal Large Language Models (MLLMs) with specialized verification tools to detect multimedia misinformation. Our system operates through six stages: raw data processing, planning, information extraction, deep research, evidence collection, and report generation. The core Deep Researcher Agent employs four tools: reverse image search, metadata analysis, fact-checking databases, and verified news processing that extracts spatial, temporal, attribution, and motivational context. We demonstrate our approach on a challenge dataset sample involving complex multimedia content. Our system successfully verified content authenticity, extracted precise geolocation and timing information, and traced source attribution across multiple platforms, effectively addressing real-world multimedia verification scenarios.

View on arXiv PDF

Similar