AfrIFact: Cultural Information Retrieval, Evidence Extraction and Fact Checking for African Languages
This addresses fact-checking challenges for communities with limited information access in African languages, though it is incremental as it builds on existing datasets and methods.
The authors tackled the problem of automatic fact-checking for low-resource African languages by introducing the AfrIFact dataset covering ten languages and English, and found that best embedding models lack cross-lingual retrieval, while few-shot prompting improved performance by up to 43% and fine-tuning by up to 26%.
Assessing the veracity of a claim made online is a complex and important task with real-world implications. When these claims are directed at communities with limited access to information and the content concerns issues such as healthcare and culture, the consequences intensify, especially in low-resource languages. In this work, we introduce AfrIFact, a dataset that covers the necessary steps for automatic fact-checking (i.e., information retrieval, evidence extraction, and fact checking), in ten African languages and English. Our evaluation results show that even the best embedding models lack cross-lingual retrieval capabilities, and that cultural and news documents are easier to retrieve than healthcare-domain documents, both in large corpora and in single documents. We show that LLMs lack robust multilingual fact-verification capabilities in African languages, while few-shot prompting improves performance by up to 43% in AfriqueQwen-14B, and task-specific fine-tuning further improves fact-checking accuracy by up to 26%. These findings, along with our release of the AfrIFact dataset, encourage work on low-resource information retrieval, evidence retrieval, and fact checking.