CLFeb 26, 2020

Marathi To English Neural Machine Translation With Near Perfect Corpus And Transformers

arXiv:2002.11643v10.79 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This provides a competitive translation system for Marathi, a language with 95 million speakers that lacks support from major services like Bing, though it is incremental as it builds on existing Transformer methods.

The paper tackled the lack of benchmarks for Neural Machine Translation (NMT) on Indian languages by training Marathi-to-English translators using BERT-tokenizer and Transformer architectures, achieving better BLEU scores than Google on Tatoeba and Wikimedia datasets.

There have been very few attempts to benchmark performances of state-of-the-art algorithms for Neural Machine Translation task on Indian Languages. Google, Bing, Facebook and Yandex are some of the very few companies which have built translation systems for few of the Indian Languages. Among them, translation results from Google are supposed to be better, based on general inspection. Bing-Translator do not even support Marathi language which has around 95 million speakers and ranks 15th in the world in terms of combined primary and secondary speakers. In this exercise, we trained and compared variety of Neural Machine Marathi to English Translators trained with BERT-tokenizer by huggingface and various Transformer based architectures using Facebook's Fairseq platform with limited but almost correct parallel corpus to achieve better BLEU scores than Google on Tatoeba and Wikimedia open datasets.

View on arXiv PDF Code

Similar