A Comparison of Architectures and Pretraining Methods for Contextualized Multilingual Word Embeddings
This work addresses data scarcity for low-resource languages in NLP, but it is incremental as it builds on existing methods with comparative improvements.
The paper tackles the challenge of data scarcity in multilingual NLP by comparing state-of-the-art encoders and proposing a new method for multilingual contextualized word embeddings, showing it performs at or above SOTA in zero-shot transfer and improves knowledge sharing in joint training.
The lack of annotated data in many languages is a well-known challenge within the field of multilingual natural language processing (NLP). Therefore, many recent studies focus on zero-shot transfer learning and joint training across languages to overcome data scarcity for low-resource languages. In this work we (i) perform a comprehensive comparison of state-ofthe-art multilingual word and sentence encoders on the tasks of named entity recognition (NER) and part of speech (POS) tagging; and (ii) propose a new method for creating multilingual contextualized word embeddings, compare it to multiple baselines and show that it performs at or above state-of-theart level in zero-shot transfer settings. Finally, we show that our method allows for better knowledge sharing across languages in a joint training setting.