CLApr 13, 2017

Learning Joint Multilingual Sentence Representations with Neural Machine Translation

arXiv:1704.04154v2218 citations
AI Analysis

This work addresses cross-lingual semantic understanding for multilingual NLP applications, but is incremental as it builds on existing neural machine translation frameworks.

The paper tackled the problem of learning language-independent sentence representations using neural machine translation across six languages, and found that sentences close in embedding space are semantically related despite structural differences.

In this paper, we use the framework of neural machine translation to learn joint sentence representations across six very different languages. Our aim is that a representation which is independent of the language, is likely to capture the underlying semantics. We define a new cross-lingual similarity measure, compare up to 1.4M sentence representations and study the characteristics of close sentences. We provide experimental evidence that sentences that are close in embedding space are indeed semantically highly related, but often have quite different structure and syntax. These relations also hold when comparing sentences in different languages.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes