Domain Adaptation of NMT models for English-Hindi Machine Translation Task at AdapMT ICON 2020
This work provides an incremental improvement in domain adaptation techniques for English-Hindi NMT, specifically for researchers and practitioners working with low-resource Indian languages in specialized domains.
This paper addresses domain adaptation for English-Hindi Neural Machine Translation (NMT) in AI and Chemistry domains. The authors used LSTM and Transformer architectures with fine-tuning and mixed-domain data approaches, achieving first place in the chemistry and general domain tasks and second in the AI domain task.
Recent advancements in Neural Machine Translation (NMT) models have proved to produce a state of the art results on machine translation for low resource Indian languages. This paper describes the neural machine translation systems for the English-Hindi language presented in AdapMT Shared Task ICON 2020. The shared task aims to build a translation system for Indian languages in specific domains like Artificial Intelligence (AI) and Chemistry using a small in-domain parallel corpus. We evaluated the effectiveness of two popular NMT models i.e, LSTM, and Transformer architectures for the English-Hindi machine translation task based on BLEU scores. We train these models primarily using the out of domain data and employ simple domain adaptation techniques based on the characteristics of the in-domain dataset. The fine-tuning and mixed-domain data approaches are used for domain adaptation. Our team was ranked first in the chemistry and general domain En-Hi translation task and second in the AI domain En-Hi translation task.