CLSEMar 2, 2023

Letz Translate: Low-Resource Machine Translation for Luxembourgish

arXiv:2303.01347v15 citationsh-index: 54
Originality Incremental advance
AI Analysis

This addresses the problem of deploying machine translation in constrained environments like mobile devices for low-resource languages, though it is incremental as it builds on existing distillation techniques.

The paper tackled low-resource machine translation for Luxembourgish by using knowledge distillation from a large multilingual model and leveraging related high-resource languages, resulting in models that are over 30% faster with only a 4% performance drop compared to the state-of-the-art.

Natural language processing of Low-Resource Languages (LRL) is often challenged by the lack of data. Therefore, achieving accurate machine translation (MT) in a low-resource environment is a real problem that requires practical solutions. Research in multilingual models have shown that some LRLs can be handled with such models. However, their large size and computational needs make their use in constrained environments (e.g., mobile/IoT devices or limited/old servers) impractical. In this paper, we address this problem by leveraging the power of large multilingual MT models using knowledge distillation. Knowledge distillation can transfer knowledge from a large and complex teacher model to a simpler and smaller student model without losing much in performance. We also make use of high-resource languages that are related or share the same linguistic root as the target LRL. For our evaluation, we consider Luxembourgish as the LRL that shares some roots and properties with German. We build multiple resource-efficient models based on German, knowledge distillation from the multilingual No Language Left Behind (NLLB) model, and pseudo-translation. We find that our efficient models are more than 30\% faster and perform only 4\% lower compared to the large state-of-the-art NLLB model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes