CLJul 5, 2022

Cross-Lingual QA as a Stepping Stone for Monolingual Open QA in Icelandic

arXiv:2207.01918v1628 citationsh-index: 9
Originality Incremental advance
AI Analysis

This work addresses the problem of data scarcity for open QA in non-English languages, offering a practical solution for languages like Icelandic, though it is incremental as it builds on existing cross-lingual methods.

The authors tackled the challenge of building open question answering systems for low-resource languages like Icelandic by using machine-translated data and a bilingual language model, resulting in a system that efficiently adapts to monolingual open QA with limited labeled data.

It can be challenging to build effective open question answering (open QA) systems for languages other than English, mainly due to a lack of labeled data for training. We present a data efficient method to bootstrap such a system for languages other than English. Our approach requires only limited QA resources in the given language, along with machine-translated data, and at least a bilingual language model. To evaluate our approach, we build such a system for the Icelandic language and evaluate performance over trivia style datasets. The corpora used for training are English in origin but machine translated into Icelandic. We train a bilingual Icelandic/English language model to embed English context and Icelandic questions following methodology introduced with DensePhrases (Lee et al., 2021). The resulting system is an open domain cross-lingual QA system between Icelandic and English. Finally, the system is adapted for Icelandic only open QA, demonstrating how it is possible to efficiently create an open QA system with limited access to curated datasets in the language of interest.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes