CLMar 3, 2021

An Empirical Study of Compound PCFGs

arXiv:2103.02298v2804 citations
AI Analysis

This is an incremental empirical evaluation of C-PCFGs, addressing their scalability and cross-lingual applicability for NLP researchers.

The study analyzed compound probabilistic context-free grammars (C-PCFGs) for unsupervised grammar induction, finding they are data-efficient and generalize to unseen lengths but do not always transfer well from English to morphology-rich languages.

Compound probabilistic context-free grammars (C-PCFGs) have recently established a new state of the art for unsupervised phrase-structure grammar induction. However, due to the high space and time complexities of chart-based representation and inference, it is difficult to investigate C-PCFGs comprehensively. In this work, we rely on a fast implementation of C-PCFGs to conduct an evaluation complementary to that of~\citet{kim-etal-2019-compound}. We start by analyzing and ablating C-PCFGs on English treebanks. Our findings suggest that (1) C-PCFGs are data-efficient and can generalize to unseen sentence/constituent lengths; and (2) C-PCFGs make the best use of sentence-level information in generating preterminal rule probabilities. We further conduct a multilingual evaluation of C-PCFGs. The experimental results show that the best configurations of C-PCFGs, which are tuned on English, do not always generalize to morphology-rich languages.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes