CLOct 21, 2022

InforMask: Unsupervised Informative Masking for Language Model Pretraining

arXiv:2210.11771v124.3297 citationsh-index: 72Has Code

Originality Incremental advance

AI Analysis

This addresses the problem of inefficient pretraining for natural language understanding by providing a more effective masking strategy, though it is incremental as it builds on existing masked language modeling approaches.

The paper tackled the suboptimal random masking in language model pretraining by proposing InforMask, an unsupervised masking strategy using Pointwise Mutual Information to select informative tokens, which outperformed random and previous methods on benchmarks like LAMA and SQuAD v1/v2 with improved factual recall and question answering performance.

Masked language modeling is widely used for pretraining large language models for natural language understanding (NLU). However, random masking is suboptimal, allocating an equal masking rate for all tokens. In this paper, we propose InforMask, a new unsupervised masking strategy for training masked language models. InforMask exploits Pointwise Mutual Information (PMI) to select the most informative tokens to mask. We further propose two optimizations for InforMask to improve its efficiency. With a one-off preprocessing step, InforMask outperforms random masking and previously proposed masking strategies on the factual recall benchmark LAMA and the question answering benchmark SQuAD v1 and v2.

View on arXiv PDF Code

Similar