CL LGNov 7, 2021

Developing neural machine translation models for Hungarian-English

arXiv:2111.04099v11 citations

Originality Incremental advance

AI Analysis

This work addresses translation quality for the low-resource Hungarian-English language pair, presenting incremental improvements through novel augmentation techniques.

The paper tackles neural machine translation for Hungarian-English by evaluating structure-aware data augmentation methods based on dependency trees, achieving BLEU scores of 33.9 for Hungarian-English and 28.6 for English-Hungarian.

I train models for the task of neural machine translation for English-Hungarian and Hungarian-English, using the Hunglish2 corpus. The main contribution of this work is evaluating different data augmentation methods during the training of NMT models. I propose 5 different augmentation methods that are structure-aware, meaning that instead of randomly selecting words for blanking or replacement, the dependency tree of sentences is used as a basis for augmentation. I start my thesis with a detailed literature review on neural networks, sequential modeling, neural machine translation, dependency parsing and data augmentation. After a detailed exploratory data analysis and preprocessing of the Hunglish2 corpus, I perform experiments with the proposed data augmentation techniques. The best model for Hungarian-English achieves a BLEU score of 33.9, while the best model for English-Hungarian achieves a BLEU score of 28.6.

View on arXiv PDF

Similar