Benchmarking Azerbaijani Neural Machine Translation
This work addresses the problem of limited NMT resources for Azerbaijani, providing benchmarks for researchers and practitioners, but it is incremental as it applies existing methods to a new language.
The paper tackled the lack of research on Neural Machine Translation for Azerbaijani by benchmarking Azerbaijani-English systems across techniques and datasets, finding that Unigram segmentation improves performance and models scale better with dataset quality than quantity, though cross-domain generalization is challenging.
Little research has been done on Neural Machine Translation (NMT) for Azerbaijani. In this paper, we benchmark the performance of Azerbaijani-English NMT systems on a range of techniques and datasets. We evaluate which segmentation techniques work best on Azerbaijani translation and benchmark the performance of Azerbaijani NMT models across several domains of text. Our results show that while Unigram segmentation improves NMT performance and Azerbaijani translation models scale better with dataset quality than quantity, cross-domain generalization remains a challenge