CLDec 16, 2021

Does Pre-training Induce Systematic Inference? How Masked Language Models Acquire Commonsense Knowledge

arXiv:2112.08583v1628 citations
Originality Incremental advance
AI Analysis

This addresses a fundamental question in AI about how pre-trained models learn, with implications for understanding their limitations in reasoning tasks, though it is incremental in clarifying existing debates.

The study investigated whether masked language models like BERT acquire commonsense knowledge through systematic inference from semantics during pre-training, and found that generalization to supported inferences does not improve, indicating acquisition relies on surface-level co-occurrence patterns rather than induced reasoning.

Transformer models pre-trained with a masked-language-modeling objective (e.g., BERT) encode commonsense knowledge as evidenced by behavioral probes; however, the extent to which this knowledge is acquired by systematic inference over the semantics of the pre-training corpora is an open question. To answer this question, we selectively inject verbalized knowledge into the minibatches of a BERT model during pre-training and evaluate how well the model generalizes to supported inferences. We find generalization does not improve over the course of pre-training, suggesting that commonsense knowledge is acquired from surface-level, co-occurrence patterns rather than induced, systematic reasoning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes