CL LGApr 21, 2021

Improving BERT Pretraining with Syntactic Supervision

Giorgos Tziafas, Konstantinos Kogkalidis, Gijs Wijnholds, Michael Moortgat

arXiv:2104.10516v119.5211 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses syntactic generalization issues in NLP models for researchers, but it is incremental as it builds on existing pretraining methods with a new objective.

The paper tackled the problem of BERT's limited syntactic generalization by adding a supervised supertagging objective during pretraining, resulting in a syntax-aware model that performs competitively with baselines using a corpus ten times smaller.

Bidirectional masked Transformers have become the core theme in the current NLP landscape. Despite their impressive benchmarks, a recurring theme in recent research has been to question such models' capacity for syntactic generalization. In this work, we seek to address this question by adding a supervised, token-level supertagging objective to standard unsupervised pretraining, enabling the explicit incorporation of syntactic biases into the network's training dynamics. Our approach is straightforward to implement, induces a marginal computational overhead and is general enough to adapt to a variety of settings. We apply our methodology on Lassy Large, an automatically annotated corpus of written Dutch. Our experiments suggest that our syntax-aware model performs on par with established baselines, despite Lassy Large being one order of magnitude smaller than commonly used corpora.

View on arXiv PDF Code

Similar