CLAILGOct 5, 2021

Co-training an Unsupervised Constituency Parser with Weak Supervision

arXiv:2110.02283v2639 citationsHas Code
Originality Incremental advance
AI Analysis

This improves parsing accuracy for languages with limited labeled data, though it is incremental by building on prior weak supervision methods.

The paper tackles unsupervised constituency parsing by co-training inside and outside classifiers with weak supervision, achieving 63.1 F1 on the English PTB test set and state-of-the-art results on Chinese and Japanese treebanks.

We introduce a method for unsupervised parsing that relies on bootstrapping classifiers to identify if a node dominates a specific span in a sentence. There are two types of classifiers, an inside classifier that acts on a span, and an outside classifier that acts on everything outside of a given span. Through self-training and co-training with the two classifiers, we show that the interplay between them helps improve the accuracy of both, and as a result, effectively parse. A seed bootstrapping technique prepares the data to train these classifiers. Our analyses further validate that such an approach in conjunction with weak supervision using prior branching knowledge of a known language (left/right-branching) and minimal heuristics injects strong inductive bias into the parser, achieving 63.1 F$_1$ on the English (PTB) test set. In addition, we show the effectiveness of our architecture by evaluating on treebanks for Chinese (CTB) and Japanese (KTB) and achieve new state-of-the-art results. Our code and pre-trained models are available at https://github.com/Nickil21/weakly-supervised-parsing.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes