CLFeb 24, 2022

Probing BERT's priors with serial reproduction chains

Takateru Yamakoshi, Thomas L. Griffiths, Robert D. Hawkins

arXiv:2202.12226v232.0640 citationsHas Code

Originality Incremental advance

AI Analysis

This provides a theoretical foundation for bottom-up probing of language models, addressing a specific bottleneck in NLP research.

The paper tackled the problem of generating representative samples from masked language models like BERT, which lack consistent conditional distributions, by using serial reproduction chains with a Generative Stochastic Network sampler. The result showed that sentences from these chains closely matched ground-truth corpus distributions and outperformed other methods in naturalness judgments.

Sampling is a promising bottom-up method for exposing what generative models have learned about language, but it remains unclear how to generate representative samples from popular masked language models (MLMs) like BERT. The MLM objective yields a dependency network with no guarantee of consistent conditional distributions, posing a problem for naive approaches. Drawing from theories of iterated learning in cognitive science, we explore the use of serial reproduction chains to sample from BERT's priors. In particular, we observe that a unique and consistent estimator of the ground-truth joint distribution is given by a Generative Stochastic Network (GSN) sampler, which randomly selects which token to mask and reconstruct on each step. We show that the lexical and syntactic statistics of sentences from GSN chains closely match the ground-truth corpus distribution and perform better than other methods in a large corpus of naturalness judgments. Our findings establish a firmer theoretical foundation for bottom-up probing and highlight richer deviations from human priors.

View on arXiv PDF Code

Similar