CLSep 10, 2018

Depth-bounding is effective: Improvements and evaluation of unsupervised PCFG induction

arXiv:1809.03112v11092 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of recursive complexity in grammar induction for computational linguistics, offering an incremental improvement over existing depth-bounding techniques.

The paper tackled the problem of improving unsupervised grammar induction by applying depth-bounding within a chart-based Bayesian PCFG inducer, showing that it significantly increases parsing accuracy and produces results competitive with state-of-the-art models on English, Chinese, and German.

There have been several recent attempts to improve the accuracy of grammar induction systems by bounding the recursive complexity of the induction model (Ponvert et al., 2011; Noji and Johnson, 2016; Shain et al., 2016; Jin et al., 2018). Modern depth-bounded grammar inducers have been shown to be more accurate than early unbounded PCFG inducers, but this technique has never been compared against unbounded induction within the same system, in part because most previous depth-bounding models are built around sequence models, the complexity of which grows exponentially with the maximum allowed depth. The present work instead applies depth bounds within a chart-based Bayesian PCFG inducer (Johnson et al., 2007b), where bounding can be switched on and off, and then samples trees with and without bounding. Results show that depth-bounding is indeed significantly effective in limiting the search space of the inducer and thereby increasing the accuracy of the resulting parsing model. Moreover, parsing results on English, Chinese and German show that this bounded model with a new inference technique is able to produce parse trees more accurately than or competitively with state-of-the-art constituency-based grammar induction models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes