CLLGMar 2, 2022

Discontinuous Constituency and BERT: A Case Study of Dutch

arXiv:2203.01063v2640 citationsh-index: 9
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of evaluating syntactic understanding in language models for linguists and NLP researchers, but it is incremental as it builds on existing probing methods.

The paper tackled the problem of quantifying BERT's syntactic capacity for non-context-free patterns in Dutch, specifically control verb nesting and verb raising, and found that the models failed to implicitly acquire these dependencies.

In this paper, we set out to quantify the syntactic capacity of BERT in the evaluation regime of non-context free patterns, as occurring in Dutch. We devise a test suite based on a mildly context-sensitive formalism, from which we derive grammars that capture the linguistic phenomena of control verb nesting and verb raising. The grammars, paired with a small lexicon, provide us with a large collection of naturalistic utterances, annotated with verb-subject pairings, that serve as the evaluation test bed for an attention-based span selection probe. Our results, backed by extensive analysis, suggest that the models investigated fail in the implicit acquisition of the dependencies examined.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes