CLDec 15, 2022

Efficient Pre-training of Masked Language Model via Concept-based Curriculum Masking

arXiv:2212.07617v1294 citationsh-index: 19
Originality Incremental advance
AI Analysis

This addresses efficiency issues in pre-training for NLP practitioners, though it is incremental as it builds on existing curriculum learning approaches.

The paper tackles the high training cost of masked language modeling (MLM) pre-training by proposing a concept-based curriculum masking (CCM) method, which achieves comparable performance to BERT on the GLUE benchmark at half the training cost.

Masked language modeling (MLM) has been widely used for pre-training effective bidirectional representations, but incurs substantial training costs. In this paper, we propose a novel concept-based curriculum masking (CCM) method to efficiently pre-train a language model. CCM has two key differences from existing curriculum learning approaches to effectively reflect the nature of MLM. First, we introduce a carefully-designed linguistic difficulty criterion that evaluates the MLM difficulty of each token. Second, we construct a curriculum that gradually masks words related to the previously masked words by retrieving a knowledge graph. Experimental results show that CCM significantly improves pre-training efficiency. Specifically, the model trained with CCM shows comparative performance with the original BERT on the General Language Understanding Evaluation benchmark at half of the training cost.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes