CLMay 13, 2023

Multilingual Previously Fact-Checked Claim Retrieval

arXiv:2305.07991v2148 citations
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of managing vast online misinformation for fact-checkers, but it is incremental as it builds on existing retrieval methods with a new dataset.

The paper tackles the problem of helping fact-checkers by retrieving existing fact-checks for online content, introducing MultiClaim, a multilingual dataset with 28k posts in 27 languages and 206k fact-checks in 39 languages, and shows that supervised fine-tuning significantly improves over unsupervised methods.

Fact-checkers are often hampered by the sheer amount of online content that needs to be fact-checked. NLP can help them by retrieving already existing fact-checks relevant to the content being investigated. This paper introduces a new multilingual dataset -- MultiClaim -- for previously fact-checked claim retrieval. We collected 28k posts in 27 languages from social media, 206k fact-checks in 39 languages written by professional fact-checkers, as well as 31k connections between these two groups. This is the most extensive and the most linguistically diverse dataset of this kind to date. We evaluated how different unsupervised methods fare on this dataset and its various dimensions. We show that evaluating such a diverse dataset has its complexities and proper care needs to be taken before interpreting the results. We also evaluated a supervised fine-tuning approach, improving upon the unsupervised method significantly.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes