CLLGJun 10, 2020

MC-BERT: Efficient Language Pre-Training via a Meta Controller

arXiv:2006.05744v221 citations
AI Analysis

This work addresses efficiency in NLP pre-training for researchers and practitioners, offering an incremental improvement over existing methods like ELECTRA.

The paper tackles the computational expense of large-scale language pre-training by proposing MC-BERT, a meta-learning framework that uses a multi-choice cloze test with a reject option, which outperforms baselines on GLUE semantic tasks under the same computational budget.

Pre-trained contextual representations (e.g., BERT) have become the foundation to achieve state-of-the-art results on many NLP tasks. However, large-scale pre-training is computationally expensive. ELECTRA, an early attempt to accelerate pre-training, trains a discriminative model that predicts whether each input token was replaced by a generator. Our studies reveal that ELECTRA's success is mainly due to its reduced complexity of the pre-training task: the binary classification (replaced token detection) is more efficient to learn than the generation task (masked language modeling). However, such a simplified task is less semantically informative. To achieve better efficiency and effectiveness, we propose a novel meta-learning framework, MC-BERT. The pre-training task is a multi-choice cloze test with a reject option, where a meta controller network provides training input and candidates. Results over GLUE natural language understanding benchmark demonstrate that our proposed method is both efficient and effective: it outperforms baselines on GLUE semantic tasks given the same computational budget.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes