CL AI LGMay 28, 2021

Knowledge Inheritance for Pre-trained Language Models

Yujia Qin, Yankai Lin, Jing Yi, Jiajie Zhang, Xu Han, Zhengyan Zhang, Yusheng Su, Zhiyuan Liu, Peng Li, Maosong Sun, Jie Zhou

arXiv:2105.13880v230.7643 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the problem of resource-intensive PLM training for researchers and practitioners, though it is incremental as it builds on existing knowledge distillation techniques.

The paper tackles the high computational cost of training large-scale pre-trained language models (PLMs) by proposing a knowledge inheritance (KI) framework that uses knowledge distillation from existing PLMs as auxiliary supervision during pre-training, resulting in improved training efficiency as demonstrated experimentally.

Recent explorations of large-scale pre-trained language models (PLMs) have revealed the power of PLMs with huge amounts of parameters, setting off a wave of training ever-larger PLMs. However, it requires tremendous computational resources to train a large-scale PLM, which may be practically unaffordable. In addition, existing large-scale PLMs are mainly trained from scratch individually, ignoring that many well-trained PLMs are available. To this end, we explore the question how could existing PLMs benefit training large-scale PLMs in future. Specifically, we introduce a pre-training framework named "knowledge inheritance" (KI) and explore how could knowledge distillation serve as auxiliary supervision during pre-training to efficiently learn larger PLMs. Experimental results demonstrate the superiority of KI in training efficiency. We also conduct empirical analyses to explore the effects of teacher PLMs' pre-training settings, including model architecture, pre-training data, etc. Finally, we show that KI could be applied to domain adaptation and knowledge transfer.

View on arXiv PDF Code

Similar