CLLGApr 21, 2021

Improving BERT Pretraining with Syntactic Supervision

arXiv:2104.10516v1211 citations
Originality Incremental advance
AI Analysis

This work addresses syntactic generalization issues in NLP models for researchers, but it is incremental as it builds on existing pretraining methods with a new objective.

The paper tackled the problem of BERT's limited syntactic generalization by adding a supervised supertagging objective during pretraining, resulting in a syntax-aware model that performs competitively with baselines using a corpus ten times smaller.

Bidirectional masked Transformers have become the core theme in the current NLP landscape. Despite their impressive benchmarks, a recurring theme in recent research has been to question such models' capacity for syntactic generalization. In this work, we seek to address this question by adding a supervised, token-level supertagging objective to standard unsupervised pretraining, enabling the explicit incorporation of syntactic biases into the network's training dynamics. Our approach is straightforward to implement, induces a marginal computational overhead and is general enough to adapt to a variety of settings. We apply our methodology on Lassy Large, an automatically annotated corpus of written Dutch. Our experiments suggest that our syntax-aware model performs on par with established baselines, despite Lassy Large being one order of magnitude smaller than commonly used corpora.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes