CLAIJan 23, 2023

Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning

arXiv:2301.09626v136 citationsh-index: 9
Originality Incremental advance
AI Analysis

This addresses the performance gap for languages with fewer compute and data resources, though it is incremental as it builds on prior cross-lingual transfer work.

The paper tackles the problem of inefficient language model training for low-resource languages by introducing CLP-Transfer, a cross-lingual and progressive transfer learning method that saves up to 80% of training steps compared to random initialization.

Most Transformer language models are primarily pretrained on English text, limiting their use for other languages. As the model sizes grow, the performance gap between English and other languages with fewer compute and data resources increases even further. Consequently, more resource-efficient training methods are needed to bridge the gap for languages with fewer resources available. To address this problem, we introduce a cross-lingual and progressive transfer learning approach, called CLP-Transfer, that transfers models from a source language, for which pretrained models are publicly available, like English, to a new target language. As opposed to prior work, which focused on the cross-lingual transfer between two languages, we extend the transfer to the model size. Given a pretrained model in a source language, we aim for a same-sized model in a target language. Instead of training a model from scratch, we exploit a smaller model that is in the target language but requires much fewer resources. Both small and source models are then used to initialize the token embeddings of the larger model based on the overlapping vocabulary of the source and target language. All remaining weights are reused from the model in the source language. This approach outperforms the sole cross-lingual transfer and can save up to 80% of the training steps compared to the random initialization.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes