Efficient Pre-training of Masked Language Model via Concept-based Curriculum Masking
This addresses efficiency issues in pre-training for NLP practitioners, though it is incremental as it builds on existing curriculum learning approaches.
The paper tackles the high training cost of masked language modeling (MLM) pre-training by proposing a concept-based curriculum masking (CCM) method, which achieves comparable performance to BERT on the GLUE benchmark at half the training cost.
Masked language modeling (MLM) has been widely used for pre-training effective bidirectional representations, but incurs substantial training costs. In this paper, we propose a novel concept-based curriculum masking (CCM) method to efficiently pre-train a language model. CCM has two key differences from existing curriculum learning approaches to effectively reflect the nature of MLM. First, we introduce a carefully-designed linguistic difficulty criterion that evaluates the MLM difficulty of each token. Second, we construct a curriculum that gradually masks words related to the previously masked words by retrieving a knowledge graph. Experimental results show that CCM significantly improves pre-training efficiency. Specifically, the model trained with CCM shows comparative performance with the original BERT on the General Language Understanding Evaluation benchmark at half of the training cost.