CLOct 5, 2021

Sicilian Translator: A Recipe for Low-Resource NMT

arXiv:2110.01938v13 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of machine translation for a low-resource language, Sicilian, which is incremental as it applies existing methods like Transformers and backtranslation to a new dataset.

The authors tackled the problem of low-resource neural machine translation for Sicilian-English with only 17,000 sentence pairs, achieving BLEU scores in the upper 20s and improving to mid 30s using backtranslation and multilingual translation.

With 17,000 pairs of Sicilian-English translated sentences, Arba Sicula developed the first neural machine translator for the Sicilian language. Using small subword vocabularies, we trained small Transformer models with high dropout parameters and achieved BLEU scores in the upper 20s. Then we supplemented our dataset with backtranslation and multilingual translation and pushed our scores into the mid 30s. We also attribute our success to incorporating theoretical information in our dataset. Prior to training, we biased the subword vocabulary towards the desinences one finds in a textbook. And we included textbook exercises in our dataset.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes