What makes multilingual BERT multilingual?
This work provides incremental insights into cross-lingual transfer for NLP researchers.
The study investigated factors influencing the cross-lingual transfer ability of multilingual BERT, finding that data size and context window size are crucial for transferability.
Recently, multilingual BERT works remarkably well on cross-lingual transfer tasks, superior to static non-contextualized word embeddings. In this work, we provide an in-depth experimental study to supplement the existing literature of cross-lingual ability. We compare the cross-lingual ability of non-contextualized and contextualized representation model with the same data. We found that datasize and context window size are crucial factors to the transferability.