CLMar 30, 2020

Span-based discontinuous constituency parsing: a family of exact chart-based algorithms with time complexities from O(n^6) down to O(n^3)

arXiv:2003.13785v1998 citations
AI Analysis

This work addresses the challenge of efficient parsing for discontinuous structures in computational linguistics, offering a significant speed improvement while maintaining high coverage, though it is incremental in optimizing existing methods.

The paper tackles the problem of parsing discontinuous constituency trees with block degree two, introducing a chart-based algorithm that achieves time complexities from O(n^6) to O(n^3), with the cubic variant covering 98% of constituents in treebanks and matching the complexity of continuous parsers. It reports state-of-the-art results on German and English treebanks in fully supervised settings.

We introduce a novel chart-based algorithm for span-based parsing of discontinuous constituency trees of block degree two, including ill-nested structures. In particular, we show that we can build variants of our parser with smaller search spaces and time complexities ranging from $\mathcal O(n^6)$ down to $\mathcal O(n^3)$. The cubic time variant covers 98\% of constituents observed in linguistic treebanks while having the same complexity as continuous constituency parsers. We evaluate our approach on German and English treebanks (Negra, Tiger and Discontinuous PTB) and report state-of-the-art results in the fully supervised setting. We also experiment with pre-trained word embeddings and \bert{}-based neural networks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes