CLJun 4, 2019

Open Sesame: Getting Inside BERT's Linguistic Knowledge

arXiv:1906.01698v11192 citations
Originality Synthesis-oriented
AI Analysis

This provides insights into BERT's internal mechanisms for NLP researchers, but it is incremental as it builds on prior work on contextual representations.

The paper investigates how BERT encodes linguistic knowledge, finding that it switches from positional to hierarchical encoding across layers, though with less sensitivity than humans in tasks like reflexive anaphora.

How and to what extent does BERT encode syntactically-sensitive hierarchical information or positionally-sensitive linear information? Recent work has shown that contextual representations like BERT perform well on tasks that require sensitivity to linguistic structure. We present here two studies which aim to provide a better understanding of the nature of BERT's representations. The first of these focuses on the identification of structurally-defined elements using diagnostic classifiers, while the second explores BERT's representation of subject-verb agreement and anaphor-antecedent dependencies through a quantitative assessment of self-attention vectors. In both cases, we find that BERT encodes positional information about word tokens well on its lower layers, but switches to a hierarchically-oriented encoding on higher layers. We conclude then that BERT's representations do indeed model linguistically relevant aspects of hierarchical structure, though they do not appear to show the sharp sensitivity to hierarchical structure that is found in human processing of reflexive anaphora.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes