CL AIMar 29, 2021

Unsupervised Machine Translation On Dravidian Languages

arXiv:2103.15877v132.7804 citations

Originality Incremental advance

AI Analysis

This work addresses translation challenges for low-resource Dravidian languages, but it is incremental as it builds on existing methods with auxiliary data.

The paper tackles unsupervised machine translation for low-resource Dravidian languages, specifically English-Kannada, by using auxiliary data from related languages and unifying writing systems, achieving improved results through model architectures that maximize knowledge sharing.

Unsupervised neural machine translation (UNMT) is beneficial especially for low resource languages such as those from the Dravidian family. However, UNMT systems tend to fail in realistic scenarios involving actual low resource languages. Recent works propose to utilize auxiliary parallel data and have achieved state-of-the-art results. In this work, we focus on unsupervised translation between English and Kannada, a low resource Dravidian language. We additionally utilize a limited amount of auxiliary data between English and other related Dravidian languages. We show that unifying the writing systems is essential in unsupervised translation between the Dravidian languages. We explore several model architectures that use the auxiliary data in order to maximize knowledge sharing and enable UNMT for distant language pairs. Our experiments demonstrate that it is crucial to include auxiliary languages that are similar to our focal language, Kannada. Furthermore, we propose a metric to measure language similarity and show that it serves as a good indicator for selecting the auxiliary languages.

View on arXiv PDF

Similar