Towards Probabilistically-Sound Beam Search with Masked Language Models
This work addresses a domain-specific problem for applications like ancient text restoration and protein engineering, though it is incremental in improving existing beam search techniques.
The paper tackles the challenge of performing probabilistically-sound beam search with masked language models (MLMs), which lack joint probability distributions, by introducing a theoretically sound method for text infilling and an inference-time modification that outperforms standard beam search under expected conditions.
Beam search with masked language models (MLMs) is challenging in part because joint probability distributions over sequences are not readily available, unlike for autoregressive models. However, estimating such distributions has important domain-specific applications such as ancient text restoration and protein engineering. Here we present probabilistically-sound methods for beam search with MLMs. First, we clarify the conditions under which it is theoretically sound to perform text infilling with MLMs using standard beam search. When these conditions fail, we provide a probabilistically-sound inference time modification with no additional computational complexity and demonstrate that it is superior to the aforementioned beam search in the expected conditions. We then present empirical results comparing several infilling approaches with MLMs across several domains. Notably, our method probes the inductive biases of MLMs and explores the surprising contextual sensitivity of mask tokens for text infilling.