Unsupervised Grammar Induction with Depth-bounded PCFG
This work addresses grammar acquisition for natural language processing, offering an incremental improvement by applying depth-bounding to reduce search space in PCFG models.
The paper tackles the problem of unsupervised grammar induction by extending depth-bounding to probabilistic context-free grammars (DB-PCFG), resulting in competitive or superior parse accuracy on child-directed speech and newswire text, with grammars showing consistent use of category labels.
There has been recent interest in applying cognitively or empirically motivated bounds on recursion depth to limit the search space of grammar induction models (Ponvert et al., 2011; Noji and Johnson, 2016; Shain et al., 2016). This work extends this depth-bounding approach to probabilistic context-free grammar induction (DB-PCFG), which has a smaller parameter space than hierarchical sequence models, and therefore more fully exploits the space reductions of depth-bounding. Results for this model on grammar acquisition from transcribed child-directed speech and newswire text exceed or are competitive with those of other models when evaluated on parse accuracy. Moreover, gram- mars acquired from this model demonstrate a consistent use of category labels, something which has not been demonstrated by other acquisition models.