The Return of Lexical Dependencies: Neural Lexicalized PCFGs
This work addresses the challenge of unsupervised grammar induction for natural language processing, offering a unified framework that overcomes sparsity issues in lexicalized PCFGs.
The paper tackles the problem of grammar induction by showing that modeling lexical dependencies in context-free grammar methods improves performance, resulting in stronger results for both constituents and dependencies than modeling either alone.
In this paper we demonstrate that $\textit{context free grammar (CFG) based methods for grammar induction benefit from modeling lexical dependencies}$. This contrasts to the most popular current methods for grammar induction, which focus on discovering $\textit{either}$ constituents $\textit{or}$ dependencies. Previous approaches to marry these two disparate syntactic formalisms (e.g. lexicalized PCFGs) have been plagued by sparsity, making them unsuitable for unsupervised grammar induction. However, in this work, we present novel neural models of lexicalized PCFGs which allow us to overcome sparsity problems and effectively induce both constituents and dependencies within a single model. Experiments demonstrate that this unified framework results in stronger results on both representations than achieved when modeling either formalism alone. Code is available at https://github.com/neulab/neural-lpcfg.