CLMar 4, 2025

Large Language Models for Multilingual Previously Fact-Checked Claim Detection

Ivan Vykopal, Matúš Pikuliak, Simon Ostermann, Tatiana Anikina, Michal Gregor, Marián Šimko

arXiv:2503.02737v312.05 citationsh-index: 10Has CodeEMNLP

Originality Synthesis-oriented

AI Analysis

It addresses the challenge of multilingual false information spread for fact-checkers, but is incremental as it evaluates existing LLMs on a new task.

This paper tackles the problem of automatically detecting previously fact-checked claims across multiple languages to reduce duplication of efforts by human fact-checkers, finding that large language models perform well for high-resource languages but struggle with low-resource ones, and that translation into English helps for low-resource languages.

In our era of widespread false information, human fact-checkers often face the challenge of duplicating efforts when verifying claims that may have already been addressed in other countries or languages. As false information transcends linguistic boundaries, the ability to automatically detect previously fact-checked claims across languages has become an increasingly important task. This paper presents the first comprehensive evaluation of large language models (LLMs) for multilingual previously fact-checked claim detection. We assess seven LLMs across 20 languages in both monolingual and cross-lingual settings. Our results show that while LLMs perform well for high-resource languages, they struggle with low-resource languages. Moreover, translating original texts into English proved to be beneficial for low-resource languages. These findings highlight the potential of LLMs for multilingual previously fact-checked claim detection and provide a foundation for further research on this promising application of LLMs.

View on arXiv PDF Code

Similar