Parsing with Traces: An $O(n^4)$ Algorithm and a Structural Representation
This addresses the need for more flexible parsing in computational linguistics, allowing for better handling of complex linguistic phenomena like traces and shared arguments, though it is incremental in extending existing parsing methods.
The paper tackles the problem of parsing graph-structured linguistic data, which typically violates tree constraints, by proposing a new representation and algorithm that efficiently handles directed, acyclic, one-endpoint-crossing graphs, covering 97.3% of the Penn English Treebank.
General treebank analyses are graph structured, but parsers are typically restricted to tree structures for efficiency and modeling reasons. We propose a new representation and algorithm for a class of graph structures that is flexible enough to cover almost all treebank structures, while still admitting efficient learning and inference. In particular, we consider directed, acyclic, one-endpoint-crossing graph structures, which cover most long-distance dislocation, shared argumentation, and similar tree-violating linguistic phenomena. We describe how to convert phrase structure parses, including traces, to our new representation in a reversible manner. Our dynamic program uniquely decomposes structures, is sound and complete, and covers 97.3% of the Penn English Treebank. We also implement a proof-of-concept parser that recovers a range of null elements and trace types.