CLMay 24, 2023

Deriving Language Models from Masked Language Models

arXiv:2305.15501v1227 citations
Originality Incremental advance
AI Analysis

This addresses a foundational issue in natural language processing for researchers and practitioners using MLMs, but it is incremental as it builds on prior work to improve distribution derivation.

The paper tackles the problem that masked language models (MLMs) do not explicitly define language distributions, and it studies methods to derive explicit joint distributions from MLMs, finding that an approach based on matching conditionals works well and outperforms existing methods, with the derived model's conditionals sometimes outperforming the original MLM's conditionals.

Masked language models (MLM) do not explicitly define a distribution over language, i.e., they are not language models per se. However, recent work has implicitly treated them as such for the purposes of generation and scoring. This paper studies methods for deriving explicit joint distributions from MLMs, focusing on distributions over two tokens, which makes it possible to calculate exact distributional properties. We find that an approach based on identifying joints whose conditionals are closest to those of the MLM works well and outperforms existing Markov random field-based approaches. We further find that this derived model's conditionals can even occasionally outperform the original MLM's conditionals.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes