CL LGJun 10, 2020

MC-BERT: Efficient Language Pre-Training via a Meta Controller

Zhenhui Xu, Linyuan Gong, Guolin Ke, Di He, Shuxin Zheng, Liwei Wang, Jiang Bian, Tie-Yan Liu

arXiv:2006.05744v23.021 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses efficiency in NLP pre-training for researchers and practitioners, offering an incremental improvement over existing methods like ELECTRA.

The paper tackles the computational expense of large-scale language pre-training by proposing MC-BERT, a meta-learning framework that uses a multi-choice cloze test with a reject option, which outperforms baselines on GLUE semantic tasks under the same computational budget.

Pre-trained contextual representations (e.g., BERT) have become the foundation to achieve state-of-the-art results on many NLP tasks. However, large-scale pre-training is computationally expensive. ELECTRA, an early attempt to accelerate pre-training, trains a discriminative model that predicts whether each input token was replaced by a generator. Our studies reveal that ELECTRA's success is mainly due to its reduced complexity of the pre-training task: the binary classification (replaced token detection) is more efficient to learn than the generation task (masked language modeling). However, such a simplified task is less semantically informative. To achieve better efficiency and effectiveness, we propose a novel meta-learning framework, MC-BERT. The pre-training task is a multi-choice cloze test with a reject option, where a meta controller network provides training input and candidates. Results over GLUE natural language understanding benchmark demonstrate that our proposed method is both efficient and effective: it outperforms baselines on GLUE semantic tasks given the same computational budget.

View on arXiv PDF Code

Similar