MEVER: Multi-Modal and Explainable Claim Verification with Graph-based Evidence Retrieval
This addresses the need for accurate and transparent claim verification in multi-modal contexts, particularly in scientific domains, though it is incremental as it builds on existing verification and explainability methods.
The paper tackles the problem of verifying claims by jointly reasoning over both textual and visual evidence and generating explanations, proposing a model that integrates multi-modal evidence retrieval, verification, and explanation generation, and experiments show its strength with a new scientific dataset AIChartClaim.
Verifying the truthfulness of claims usually requires joint multi-modal reasoning over both textual and visual evidence, such as analyzing both textual caption and chart image for claim verification. In addition, to make the reasoning process transparent, a textual explanation is necessary to justify the verification result. However, most claim verification works mainly focus on the reasoning over textual evidence only or ignore the explainability, resulting in inaccurate and unconvincing verification. To address this problem, we propose a novel model that jointly achieves evidence retrieval, multi-modal claim verification, and explanation generation. For evidence retrieval, we construct a two-layer multi-modal graph for claims and evidence, where we design image-to-text and text-to-image reasoning for multi-modal retrieval. For claim verification, we propose token- and evidence-level fusion to integrate claim and evidence embeddings for multi-modal verification. For explanation generation, we introduce multi-modal Fusion-in-Decoder for explainability. Finally, since almost all the datasets are in general domain, we create a scientific dataset, AIChartClaim, in AI domain to complement claim verification community. Experiments show the strength of our model.