CLAIMar 20, 2021

The Effectiveness of Morphology-aware Segmentation in Low-Resource Neural Machine Translation

arXiv:2103.11189v1802 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of selecting effective segmentation methods for low-resource machine translation, but it is incremental as it shows no clear advantage over existing approaches.

The paper evaluated modern subword segmentation methods, including morphology-aware ones, for low-resource neural machine translation between English and languages like Nepali, Sinhala, and Kazakh, finding no consistent performance differences compared to BPE, with results often statistically indistinguishable.

This paper evaluates the performance of several modern subword segmentation methods in a low-resource neural machine translation setting. We compare segmentations produced by applying BPE at the token or sentence level with morphologically-based segmentations from LMVR and MORSEL. We evaluate translation tasks between English and each of Nepali, Sinhala, and Kazakh, and predict that using morphologically-based segmentation methods would lead to better performance in this setting. However, comparing to BPE, we find that no consistent and reliable differences emerge between the segmentation methods. While morphologically-based methods outperform BPE in a few cases, what performs best tends to vary across tasks, and the performance of segmentation methods is often statistically indistinguishable.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes