CLJun 12, 2019

Probing Multilingual Sentence Representations With X-Probe

arXiv:1906.05061v11099 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need for better evaluation tools in multilingual NLP, though it is incremental as it extends existing probing methods to new languages.

The paper tackled the problem of evaluating multilingual sentence representations by creating datasets in five languages and testing six cross-lingually mapped encoders, finding that these representations often retain linguistic information better than English NLI-trained encoders.

This paper extends the task of probing sentence representations for linguistic insight in a multilingual domain. In doing so, we make two contributions: first, we provide datasets for multilingual probing, derived from Wikipedia, in five languages, viz. English, French, German, Spanish and Russian. Second, we evaluate six sentence encoders for each language, each trained by mapping sentence representations to English sentence representations, using sentences in a parallel corpus. We discover that cross-lingually mapped representations are often better at retaining certain linguistic information than representations derived from English encoders trained on natural language inference (NLI) as a downstream task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes