Are Mutually Intelligible Languages Easier to Translate?
This addresses data efficiency in machine translation for linguistically related languages, but is incremental as it builds on existing translation methods.
The study investigated whether mutual intelligibility between languages reduces the data needed to train neural machine translation models, finding a strong correlation between learning curve area and human-assessed mutual intelligibility scores in Romance languages.
Two languages are considered mutually intelligible if their native speakers can communicate with each other, while using their own mother tongue. How does the fact that humans perceive a language pair as mutually intelligible affect the ability to learn a translation model between them? We hypothesize that the amount of data needed to train a neural ma-chine translation model is anti-proportional to the languages' mutual intelligibility. Experiments on the Romance language group reveal that there is indeed strong correlation between the area under a model's learning curve and mutual intelligibility scores obtained by studying human speakers.