CLIRSep 3, 2021

Cross-Lingual Training with Dense Retrieval for Document Retrieval

arXiv:2109.01628v17 citations
Originality Incremental advance
AI Analysis

This work addresses the limitation of training resources for document retrieval in non-English languages, offering practical solutions for multilingual search systems.

The paper tackled the problem of applying dense retrieval to document ranking in non-English languages by exploring transfer techniques from English annotations, finding that zero-shot model-based transfer with mBERT improves search quality and weakly-supervised transfer performs competitively against more resource-intensive methods.

Dense retrieval has shown great success in passage ranking in English. However, its effectiveness in document retrieval for non-English languages remains unexplored due to the limitation in training resources. In this work, we explore different transfer techniques for document ranking from English annotations to multiple non-English languages. Our experiments on the test collections in six languages (Chinese, Arabic, French, Hindi, Bengali, Spanish) from diverse language families reveal that zero-shot model-based transfer using mBERT improves the search quality in non-English mono-lingual retrieval. Also, we find that weakly-supervised target language transfer yields competitive performances against the generation-based target language transfer that requires external translators and query generators.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes