CLOct 27, 2025

M4FC: a Multimodal, Multilingual, Multicultural, Multitask Real-World Fact-Checking Dataset

Jiahui Geng, Jonathan Tonglet, Iryna Gurevych

arXiv:2510.23508v12 citationsh-index: 15Has Code

Originality Synthesis-oriented

AI Analysis

This dataset addresses the problem of limited real-world data for multimodal fact-checking, though it is incremental as it builds on prior datasets by expanding scope.

The authors tackled the limitations of existing multimodal fact-checking datasets by introducing M4FC, a dataset with 4,982 images and 6,980 claims across ten languages and six tasks, achieving baseline results for all tasks and analyzing task interactions.

Existing real-world datasets for multimodal automated fact-checking have multiple limitations: they contain few instances, focus on only one or two languages and tasks, suffer from evidence leakage, or depend on external sets of news articles for sourcing true claims. To address these shortcomings, we introduce M4FC, a new real-world dataset comprising 4,982 images paired with 6,980 claims. The images, verified by professional fact-checkers from 22 organizations, represent diverse cultural and geographic contexts. Each claim is available in one or two out of ten languages. M4FC spans six multimodal fact-checking tasks: visual claim extraction, claimant intent prediction, fake detection, image contextualization, location verification, and verdict prediction. We provide baseline results for all tasks and analyze how combining intermediate tasks influence downstream verdict prediction performance. We make our dataset and code available.

View on arXiv PDF

Similar