CLDec 10, 2020

As Good as New. How to Successfully Recycle English GPT-2 to Make Models for Other Languages

arXiv:2012.05628v3727 citations
AI Analysis

This method provides a computationally efficient way for researchers and developers to create generative language models for under-resourced languages by leveraging existing English models, addressing data and computational limitations.

This paper addresses the challenge of adapting large English generative language models to other languages, specifically Italian and Dutch, by retraining only the lexical embeddings of English GPT-2 without tuning its Transformer layers. The resulting models can generate realistic sentences in Italian and Dutch, performing on par with models trained from scratch, while minimizing training and preventing information loss.

Large generative language models have been very successful for English, but other languages lag behind, in part due to data and computational limitations. We propose a method that may overcome these problems by adapting existing pre-trained models to new languages. Specifically, we describe the adaptation of English GPT-2 to Italian and Dutch by retraining lexical embeddings without tuning the Transformer layers. As a result, we obtain lexical embeddings for Italian and Dutch that are aligned with the original English lexical embeddings. Additionally, we scale up complexity by transforming relearned lexical embeddings of GPT-2 small to the GPT-2 medium embedding space. This method minimises the amount of training and prevents losing information during adaptation that was learned by GPT-2. English GPT-2 models with relearned lexical embeddings can generate realistic sentences in Italian and Dutch. Though on average these sentences are still identifiable as artificial by humans, they are assessed on par with sentences generated by a GPT-2 model fully trained from scratch.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes