IRCLLGDec 30, 2019

Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning

arXiv:1912.13080v131 citations
Originality Incremental advance
AI Analysis

This addresses the problem of multilingual search for billions of non-English users, though it is incremental as it adapts existing methods to new languages.

The paper tackles the lack of training data for non-English information retrieval by using pre-trained multilingual language models to transfer an English-trained retrieval system to Arabic, Chinese Mandarin, and Spanish in a zero-shot setting, significantly outperforming unsupervised techniques.

While billions of non-English speaking users rely on search engines every day, the problem of ad-hoc information retrieval is rarely studied for non-English languages. This is primarily due to a lack of data set that are suitable to train ranking algorithms. In this paper, we tackle the lack of data by leveraging pre-trained multilingual language models to transfer a retrieval system trained on English collections to non-English queries and documents. Our model is evaluated in a zero-shot setting, meaning that we use them to predict relevance scores for query-document pairs in languages never seen during training. Our results show that the proposed approach can significantly outperform unsupervised retrieval techniques for Arabic, Chinese Mandarin, and Spanish. We also show that augmenting the English training collection with some examples from the target language can sometimes improve performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes