CL AI LGDec 2, 2022

Nonparametric Masked Language Modeling

Sewon Min, Weijia Shi, Mike Lewis, Xilun Chen, Wen-tau Yih, Hannaneh Hajishirzi, Luke Zettlemoyer

Meta AIUW

arXiv:2212.01349v223.2245 citationsh-index: 116Has Code

Originality Highly original

AI Analysis

This addresses the issue of rare token prediction for language modeling, offering a novel approach that improves performance on tasks like classification and question answering, though it is an incremental advancement over existing retrieve-and-generate methods.

The paper tackles the problem of predicting rare tokens or phrases in language models by introducing NPM, a nonparametric masked language model that retrieves tokens from a reference corpus instead of using a softmax over a finite vocabulary. The result shows that NPM outperforms significantly larger parametric models in zero-shot evaluation on 16 tasks, particularly in handling rare patterns and words.

Existing language models (LMs) predict tokens with a softmax over a finite vocabulary, which can make it difficult to predict rare tokens or phrases. We introduce NPM, the first nonparametric masked language model that replaces this softmax with a nonparametric distribution over every phrase in a reference corpus. NPM fills in the [MASK] solely from retrieving a token from a text corpus. We show that NPM can be efficiently trained with a contrastive objective and an in-batch approximation to full corpus retrieval. Zero-shot evaluation on 16 tasks including classification, fact probing and question answering demonstrates that NPM outperforms significantly larger parametric models, either with or without a retrieve-and-generate approach. It is particularly better at dealing with rare patterns (word senses or facts) and predicting rare or nearly unseen words (e.g., non-Latin script). We release the model and code at github.com/facebookresearch/NPM.

View on arXiv PDF Code

Similar