CLAILGSep 4, 2021

Frustratingly Simple Pretraining Alternatives to Masked Language Modeling

arXiv:2109.01819v1667 citations
Originality Incremental advance
AI Analysis

This work addresses the need for more efficient and intuitive pretraining methods in natural language processing, though it is incremental as it explores alternatives within existing frameworks.

The paper tackles the problem of replacing masked language modeling (MLM) with simpler pretraining objectives, finding that five token-level classification tasks achieve comparable or better performance on GLUE and SQuAD benchmarks using BERT-BASE, with a smaller model (BERT-MEDIUM) showing only a 1% drop in GLUE scores.

Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural language processing for learning text representations. MLM trains a model to predict a random sample of input tokens that have been replaced by a [MASK] placeholder in a multi-class setting over the entire vocabulary. When pretraining, it is common to use alongside MLM other auxiliary objectives on the token or sequence level to improve downstream performance (e.g. next sentence prediction). However, no previous work so far has attempted in examining whether other simpler linguistically intuitive or not objectives can be used standalone as main pretraining objectives. In this paper, we explore five simple pretraining objectives based on token-level classification tasks as replacements of MLM. Empirical results on GLUE and SQuAD show that our proposed methods achieve comparable or better performance to MLM using a BERT-BASE architecture. We further validate our methods using smaller models, showing that pretraining a model with 41% of the BERT-BASE's parameters, BERT-MEDIUM results in only a 1% drop in GLUE scores with our best objective.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes