CL AIOct 25, 2023

XFEVER: Exploring Fact Verification across Languages

Yi-Chen Chang, Canasai Kruengkrai, Junichi Yamagishi

arXiv:2310.16278v121.2133 citationsh-index: 14Has Code

Originality Synthesis-oriented

AI Analysis

It addresses the need for cross-lingual fact verification benchmarks, providing a dataset and baselines for researchers, but is incremental as it builds on existing English datasets.

This paper tackles the problem of fact verification across languages by introducing the XFEVER dataset, which translates the FEVER dataset into six languages, and finds that multilingual models can efficiently build fact verification models, though performance varies by language and is inferior to English, with a method to mitigate miscalibration using prediction similarity.

This paper introduces the Cross-lingual Fact Extraction and VERification (XFEVER) dataset designed for benchmarking the fact verification models across different languages. We constructed it by translating the claim and evidence texts of the Fact Extraction and VERification (FEVER) dataset into six languages. The training and development sets were translated using machine translation, whereas the test set includes texts translated by professional translators and machine-translated texts. Using the XFEVER dataset, two cross-lingual fact verification scenarios, zero-shot learning and translate-train learning, are defined, and baseline models for each scenario are also proposed in this paper. Experimental results show that the multilingual language model can be used to build fact verification models in different languages efficiently. However, the performance varies by language and is somewhat inferior to the English case. We also found that we can effectively mitigate model miscalibration by considering the prediction similarity between the English and target languages. The XFEVER dataset, code, and model checkpoints are available at https://github.com/nii-yamagishilab/xfever.

View on arXiv PDF Code

Similar