CLFeb 9, 2019

Multilingual Neural Machine Translation With Soft Decoupled Encoding

arXiv:1902.03499v166 citations
AI Analysis

This work addresses the problem of data scarcity in multilingual translation for low-resource languages, representing an incremental improvement over existing methods.

The paper tackles the challenge of learning word representations in low-resource multilingual neural machine translation by proposing Soft Decoupled Encoding, a framework that shares lexical-level information without heuristic preprocessing, resulting in gains of up to 2 BLEU and achieving state-of-the-art on four language pairs.

Multilingual training of neural machine translation (NMT) systems has led to impressive accuracy improvements on low-resource languages. However, there are still significant challenges in efficiently learning word representations in the face of paucity of data. In this paper, we propose Soft Decoupled Encoding (SDE), a multilingual lexicon encoding framework specifically designed to share lexical-level information intelligently without requiring heuristic preprocessing such as pre-segmenting the data. SDE represents a word by its spelling through a character encoding, and its semantic meaning through a latent embedding space shared by all languages. Experiments on a standard dataset of four low-resource languages show consistent improvements over strong multilingual NMT baselines, with gains of up to 2 BLEU on one of the tested languages, achieving the new state-of-the-art on all four language pairs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes