CLLGApr 11, 2020

LAReQA: Language-agnostic answer retrieval from a multilingual pool

arXiv:2004.05484v11020 citations
AI Analysis

This work addresses the challenge of multilingual retrieval for AI systems, presenting a new benchmark that highlights the need for strong cross-lingual alignment, though it is incremental as it builds on existing models like mBERT.

The authors tackled the problem of language-agnostic answer retrieval by introducing LAReQA, a benchmark requiring strong cross-lingual alignment, and found that augmenting training data with machine translation significantly improved performance over using mBERT out-of-the-box.

We present LAReQA, a challenging new benchmark for language-agnostic answer retrieval from a multilingual candidate pool. Unlike previous cross-lingual tasks, LAReQA tests for "strong" cross-lingual alignment, requiring semantically related cross-language pairs to be closer in representation space than unrelated same-language pairs. Building on multilingual BERT (mBERT), we study different strategies for achieving strong alignment. We find that augmenting training data via machine translation is effective, and improves significantly over using mBERT out-of-the-box. Interestingly, the embedding baseline that performs the best on LAReQA falls short of competing baselines on zero-shot variants of our task that only target "weak" alignment. This finding underscores our claim that languageagnostic retrieval is a substantively new kind of cross-lingual evaluation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes