CLMay 28, 2025

Multilingual vs Crosslingual Retrieval of Fact-Checked Claims: A Tale of Two Approaches

arXiv:2505.22118v21 citationsh-index: 11EMNLP
Originality Incremental advance
AI Analysis

This work addresses the challenge of limited fact-check availability in certain languages for global narratives, though it is incremental as it builds on existing retrieval methods.

The paper tackled the problem of retrieving fact-checked claims across languages by comparing multilingual and crosslingual approaches, finding that LLM-based re-ranking achieved the best results on a dataset with 47 languages and 283 language combinations.

Retrieval of previously fact-checked claims is a well-established task, whose automation can assist professional fact-checkers in the initial steps of information verification. Previous works have mostly tackled the task monolingually, i.e., having both the input and the retrieved claims in the same language. However, especially for languages with a limited availability of fact-checks and in case of global narratives, such as pandemics, wars, or international politics, it is crucial to be able to retrieve claims across languages. In this work, we examine strategies to improve the multilingual and crosslingual performance, namely selection of negative examples (in the supervised) and re-ranking (in the unsupervised setting). We evaluate all approaches on a dataset containing posts and claims in 47 languages (283 language combinations). We observe that the best results are obtained by using LLM-based re-ranking, followed by fine-tuning with negative examples sampled using a sentence similarity-based strategy. Most importantly, we show that crosslinguality is a setup with its own unique characteristics compared to the multilingual setup.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes