Inducing Syntactic Trees from BERT Representations
This addresses the challenge of extracting syntactic structures from pre-trained language models for NLP researchers, but it is incremental as it builds on existing BERT analysis methods.
The paper tackled the problem of inducing syntactic trees from BERT representations by analyzing how word deletions affect other words' representations, finding that reducible words like adjectives cause less change than critical ones like main verbs, and used this to estimate reducibilities and induce dependency trees.
We use the English model of BERT and explore how a deletion of one word in a sentence changes representations of other words. Our hypothesis is that removing a reducible word (e.g. an adjective) does not affect the representation of other words so much as removing e.g. the main verb, which makes the sentence ungrammatical and of "high surprise" for the language model. We estimate reducibilities of individual words and also of longer continuous phrases (word n-grams), study their syntax-related properties, and then also use them to induce full dependency trees.