CL LGJun 28, 2019

Lost in Translation: Loss and Decay of Linguistic Richness in Machine Translation

Eva Vanmassenhove, Dimitar Shterionov, Andy Way

arXiv:1906.12068v131.31112 citations

Originality Incremental advance

AI Analysis

It addresses the problem of reduced linguistic diversity and potential bias amplification in machine translation, which is incremental as it builds on existing concerns about MT quality and fairness.

This paper quantifies the loss of lexical richness in Machine Translation systems compared to Human Translation, finding that MT systems fail to generate diverse outputs and exacerbate frequent patterns, which may contribute to issues like gender bias.

This work presents an empirical approach to quantifying the loss of lexical richness in Machine Translation (MT) systems compared to Human Translation (HT). Our experiments show how current MT systems indeed fail to render the lexical diversity of human generated or translated text. The inability of MT systems to generate diverse outputs and its tendency to exacerbate already frequent patterns while ignoring less frequent ones, might be the underlying cause for, among others, the currently heavily debated issues related to gender biased output. Can we indeed, aside from biased data, talk about an algorithm that exacerbates seen biases?

View on arXiv PDF

Similar