CLAIIRMay 11, 2023

AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages

arXiv:2305.06897v133 citations
Originality Incremental advance
AI Analysis

This addresses the information needs of users of African languages, where cross-lingual content is often the only high-coverage source, making it an important but incremental step in QA technology.

The authors tackled the lack of digital content for African languages by creating AfriQA, a cross-lingual open-retrieval QA dataset with over 12,000 examples across 10 languages, which proves challenging for state-of-the-art models.

African languages have far less in-language content available digitally, making it challenging for question answering systems to satisfy the information needs of users. Cross-lingual open-retrieval question answering (XOR QA) systems -- those that retrieve answer content from other languages while serving people in their native language -- offer a means of filling this gap. To this end, we create AfriQA, the first cross-lingual QA dataset with a focus on African languages. AfriQA includes 12,000+ XOR QA examples across 10 African languages. While previous datasets have focused primarily on languages where cross-lingual QA augments coverage from the target language, AfriQA focuses on languages where cross-lingual answer content is the only high-coverage source of answer content. Because of this, we argue that African languages are one of the most important and realistic use cases for XOR QA. Our experiments demonstrate the poor performance of automatic translation and multilingual retrieval methods. Overall, AfriQA proves challenging for state-of-the-art QA models. We hope that the dataset enables the development of more equitable QA technology.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes