CL AI LGMar 7, 2021

Translating the Unseen? Yoruba-English MT in Low-Resource, Morphologically-Unmarked Settings

Ife Adebara, Muhammad Abdul-Mageed, Miikka Silfverberg

arXiv:2103.04225v31.24 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This addresses translation challenges in low-resource, morphologically-unmarked settings for Yorùbá-English MT, but is incremental as it compares existing methods on a specific linguistic issue.

The paper tackled the problem of translating bare nouns from Yorùbá to English, where morphological differences cause ambiguities, by comparing SMT, BiLSTM, and Transformer models, with results showing the Transformer outperformed others in 4 categories, BiLSTM in 3, and SMT in 1.

Translating between languages where certain features are marked morphologically in one but absent or marked contextually in the other is an important test case for machine translation. When translating into English which marks (in)definiteness morphologically, from Yorùbá which uses bare nouns but marks these features contextually, ambiguities arise. In this work, we perform fine-grained analysis on how an SMT system compares with two NMT systems (BiLSTM and Transformer) when translating bare nouns in Yorùbá into English. We investigate how the systems what extent they identify BNs, correctly translate them, and compare with human translation patterns. We also analyze the type of errors each model makes and provide a linguistic description of these errors. We glean insights for evaluating model performance in low-resource settings. In translating bare nouns, our results show the transformer model outperforms the SMT and BiLSTM models for 4 categories, the BiLSTM outperforms the SMT model for 3 categories while the SMT outperforms the NMT models for 1 category.

View on arXiv PDF Code

Similar