CLLGNov 19, 2019

A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

arXiv:1911.08117v11084 citations
Originality Incremental advance
AI Analysis

This addresses translation challenges for morphologically rich languages, offering incremental improvements over existing phrase-based models.

The paper tackled the problem of machine translation for morphologically rich languages by proposing a hybrid morpheme-word representation, resulting in statistically significant improvements in BLEU scores and human judgments for English to Finnish translation.

We propose a language-independent approach for improving statistical machine translation for morphologically rich languages using a hybrid morpheme-word representation where the basic unit of translation is the morpheme, but word boundaries are respected at all stages of the translation process. Our model extends the classic phrase-based model by means of (1) word boundary-aware morpheme-level phrase extraction, (2) minimum error-rate training for a morpheme-level translation model using word-level BLEU, and (3) joint scoring with morpheme- and word-level language models. Further improvements are achieved by combining our model with the classic one. The evaluation on English to Finnish using Europarl (714K sentence pairs; 15.5M English words) shows statistically significant improvements over the classic model based on BLEU and human judgments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes