CLLGJun 5, 2019

Learning Bilingual Sentence Embeddings via Autoencoding and Computing Similarities with a Multilayer Perceptron

arXiv:1906.01942v11089 citations
Originality Incremental advance
AI Analysis

This addresses the problem of extracting bilingual sentence pairs from noisy or nonparallel data for machine translation applications, representing an incremental improvement.

The paper tackles learning bilingual sentence embeddings by combining autoencoding and neural machine translation to align source and target sentences in a shared space, achieving promising results on sentence alignment recovery and WMT 2018 parallel corpus filtering tasks.

We propose a novel model architecture and training algorithm to learn bilingual sentence embeddings from a combination of parallel and monolingual data. Our method connects autoencoding and neural machine translation to force the source and target sentence embeddings to share the same space without the help of a pivot language or an additional transformation. We train a multilayer perceptron on top of the sentence embeddings to extract good bilingual sentence pairs from nonparallel or noisy parallel data. Our approach shows promising performance on sentence alignment recovery and the WMT 2018 parallel corpus filtering tasks with only a single model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes