CLOct 14, 2021

Cross-Lingual Open-Domain Question Answering with Answer Sentence Generation

arXiv:2110.07150v3300 citations
Originality Incremental advance
AI Analysis

This addresses the problem of generating complete answers across languages for users in multilingual contexts, representing an incremental extension of existing methods.

The paper tackled cross-lingual open-domain question answering by extending generative methods to multiple languages, introducing the GenTyDiQA dataset and a cross-lingual model that outperformed baselines in all five languages and monolingual pipelines in three out of five.

Open-Domain Generative Question Answering has achieved impressive performance in English by combining document-level retrieval with answer generation. These approaches, which we refer to as GenQA, can generate complete sentences, effectively answering both factoid and non-factoid questions. In this paper, we extend GenQA to the multilingual and cross-lingual settings. For this purpose, we first introduce GenTyDiQA, an extension of the TyDiQA dataset with well-formed and complete answers for Arabic, Bengali, English, Japanese, and Russian. Based on GenTyDiQA, we design a cross-lingual generative model that produces full-sentence answers by exploiting passages written in multiple languages, including languages different from the question. Our cross-lingual generative system outperforms answer sentence selection baselines for all 5 languages and monolingual generative pipelines for three out of five languages studied.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes