LGCLMay 28, 2021

Investigating Code-Mixed Modern Standard Arabic-Egyptian to English Machine Translation

arXiv:2105.13573v1730 citations
Originality Synthesis-oriented
AI Analysis

This work addresses translation for code-mixed Arabic-English text, which is incremental as it applies existing methods to a new data setting.

The paper tackled machine translation from code-mixed Modern Standard Arabic and Egyptian Arabic to English, achieving a BLEU score of 25.72 and ranking first in a shared task evaluation.

Recent progress in neural machine translation (NMT) has made it possible to translate successfully between monolingual language pairs where large parallel data exist, with pre-trained models improving performance even further. Although there exists work on translating in code-mixed settings (where one of the pairs includes text from two or more languages), it is still unclear what recent success in NMT and language modeling exactly means for translating code-mixed text. We investigate one such context, namely MT from code-mixed Modern Standard Arabic and Egyptian Arabic (MSAEA) into English. We develop models under different conditions, employing both (i) standard end-to-end sequence-to-sequence (S2S) Transformers trained from scratch and (ii) pre-trained S2S language models (LMs). We are able to acquire reasonable performance using only MSA-EN parallel data with S2S models trained from scratch. We also find LMs fine-tuned on data from various Arabic dialects to help the MSAEA-EN task. Our work is in the context of the Shared Task on Machine Translation in Code-Switching. Our best model achieves $\bf25.72$ BLEU, placing us first on the official shared task evaluation for MSAEA-EN.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes