Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models via Continual Learning
This addresses a specific issue in NLP for researchers and practitioners working with multilingual models, but it is incremental as it builds on existing fine-tuning and continual learning techniques.
The paper tackles the problem of fine-tuning pre-trained cross-lingual models weakening their cross-lingual ability, using continual learning to preserve this ability during fine-tuning, resulting in better performance on sentence retrieval and zero-shot tasks like part-of-speech tagging and named entity recognition.
Recently, fine-tuning pre-trained language models (e.g., multilingual BERT) to downstream cross-lingual tasks has shown promising results. However, the fine-tuning process inevitably changes the parameters of the pre-trained model and weakens its cross-lingual ability, which leads to sub-optimal performance. To alleviate this problem, we leverage continual learning to preserve the original cross-lingual ability of the pre-trained model when we fine-tune it to downstream tasks. The experimental result shows that our fine-tuning methods can better preserve the cross-lingual ability of the pre-trained model in a sentence retrieval task. Our methods also achieve better performance than other fine-tuning baselines on the zero-shot cross-lingual part-of-speech tagging and named entity recognition tasks.