CLApr 19, 2022

Cross-Lingual Phrase Retrieval

Microsoft
arXiv:2204.08887v1640 citationsh-index: 32Has Code
Originality Incremental advance
AI Analysis

This addresses the open problem of phrase-level retrieval across languages, which is incremental as it builds on existing word- and sentence-level methods.

The paper tackles the problem of cross-lingual phrase retrieval by proposing XPR, which learns phrase representations from unlabeled example sentences, and it outperforms state-of-the-art baselines while demonstrating zero-shot transferability to unseen language pairs.

Cross-lingual retrieval aims to retrieve relevant text across languages. Current methods typically achieve cross-lingual retrieval by learning language-agnostic text representations in word or sentence level. However, how to learn phrase representations for cross-lingual phrase retrieval is still an open problem. In this paper, we propose XPR, a cross-lingual phrase retriever that extracts phrase representations from unlabeled example sentences. Moreover, we create a large-scale cross-lingual phrase retrieval dataset, which contains 65K bilingual phrase pairs and 4.2M example sentences in 8 English-centric language pairs. Experimental results show that XPR outperforms state-of-the-art baselines which utilize word-level or sentence-level representations. XPR also shows impressive zero-shot transferability that enables the model to perform retrieval in an unseen language pair during training. Our dataset, code, and trained models are publicly available at www.github.com/cwszz/XPR/.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes