Developing neural machine translation models for Hungarian-English
This work addresses translation quality for the low-resource Hungarian-English language pair, presenting incremental improvements through novel augmentation techniques.
The paper tackles neural machine translation for Hungarian-English by evaluating structure-aware data augmentation methods based on dependency trees, achieving BLEU scores of 33.9 for Hungarian-English and 28.6 for English-Hungarian.
I train models for the task of neural machine translation for English-Hungarian and Hungarian-English, using the Hunglish2 corpus. The main contribution of this work is evaluating different data augmentation methods during the training of NMT models. I propose 5 different augmentation methods that are structure-aware, meaning that instead of randomly selecting words for blanking or replacement, the dependency tree of sentences is used as a basis for augmentation. I start my thesis with a detailed literature review on neural networks, sequential modeling, neural machine translation, dependency parsing and data augmentation. After a detailed exploratory data analysis and preprocessing of the Hunglish2 corpus, I perform experiments with the proposed data augmentation techniques. The best model for Hungarian-English achieves a BLEU score of 33.9, while the best model for English-Hungarian achieves a BLEU score of 28.6.