CLFeb 25, 2019

Lost in Machine Translation: A Method to Reduce Meaning Loss

arXiv:1902.09514v41096 citations
Originality Incremental advance
AI Analysis

This addresses the issue of ambiguous translations for users of machine translation systems, though it is incremental as it builds on existing pre-trained models.

The paper tackles the problem of meaning loss in machine translation, where state-of-the-art systems often fail to preserve distinctions in meaning, such as translating 'I cut my finger.' and 'I cut my finger off.' to the same ambiguous French sentence. It presents a method based on Bayesian models that increases injectivity, resulting in improved cycle-consistency without reducing BLEU scores.

A desideratum of high-quality translation systems is that they preserve meaning, in the sense that two sentences with different meanings should not translate to one and the same sentence in another language. However, state-of-the-art systems often fail in this regard, particularly in cases where the source and target languages partition the "meaning space" in different ways. For instance, "I cut my finger." and "I cut my finger off." describe different states of the world but are translated to French (by both Fairseq and Google Translate) as "Je me suis coupe le doigt.", which is ambiguous as to whether the finger is detached. More generally, translation systems are typically many-to-one (non-injective) functions from source to target language, which in many cases results in important distinctions in meaning being lost in translation. Building on Bayesian models of informative utterance production, we present a method to define a less ambiguous translation system in terms of an underlying pre-trained neural sequence-to-sequence model. This method increases injectivity, resulting in greater preservation of meaning as measured by improvement in cycle-consistency, without impeding translation quality (measured by BLEU score).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes