Multimodal Automated Fact-Checking: A Survey
It addresses the challenge of combating misinformation that uses multiple modalities, which is critical for improving public trust and information integrity, but is incremental as it builds on existing automated fact-checking research.
This survey tackles the problem of automated fact-checking for multimodal misinformation, which spreads faster and is perceived as more credible than text-only misinformation, by conceptualizing a framework, mapping related terms, and reviewing benchmarks and models for text, image, audio, and video modalities.
Misinformation is often conveyed in multiple modalities, e.g. a miscaptioned image. Multimodal misinformation is perceived as more credible by humans, and spreads faster than its text-only counterparts. While an increasing body of research investigates automated fact-checking (AFC), previous surveys mostly focus on text. In this survey, we conceptualise a framework for AFC including subtasks unique to multimodal misinformation. Furthermore, we discuss related terms used in different communities and map them to our framework. We focus on four modalities prevalent in real-world fact-checking: text, image, audio, and video. We survey benchmarks and models, and discuss limitations and promising directions for future research