CLSep 29, 2024

Multimodal Misinformation Detection by Learning from Synthetic Data with Multimodal LLMs

arXiv:2409.19656v131 citationsh-index: 6
Originality Incremental advance
AI Analysis

This work addresses the challenge of costly real-world data collection for multimodal misinformation detection, offering an incremental improvement by improving generalizability from synthetic to real data.

The paper tackles the problem of detecting multimodal misinformation in image-text pairs by addressing the distribution gap between synthetic training data and real-world scenarios, proposing two model-agnostic data selection methods that enhance a small MLLM's performance to surpass GPT-4V on real-world fact-checking datasets.

Detecting multimodal misinformation, especially in the form of image-text pairs, is crucial. Obtaining large-scale, high-quality real-world fact-checking datasets for training detectors is costly, leading researchers to use synthetic datasets generated by AI technologies. However, the generalizability of detectors trained on synthetic data to real-world scenarios remains unclear due to the distribution gap. To address this, we propose learning from synthetic data for detecting real-world multimodal misinformation through two model-agnostic data selection methods that match synthetic and real-world data distributions. Experiments show that our method enhances the performance of a small MLLM (13B) on real-world fact-checking datasets, enabling it to even surpass GPT-4V~\cite{GPT-4V}.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes