Supertagging-based Parsing with Linear Context-free Rewriting Systems
This work addresses the challenge of efficient and accurate discontinuous parsing for natural language processing, particularly for languages like English and German, representing an incremental improvement over existing methods.
The authors tackled the problem of parsing with Linear Context-free Rewriting Systems (LCFRS) by developing the first supertagging-based parser, which significantly outperforms previous LCFRS parsers in accuracy and speed, achieving excellent scores for discontinuous constituents that compete with the best general discontinuous parsers.
We present the first supertagging-based parser for LCFRS. It utilizes neural classifiers and tremendously outperforms previous LCFRS-based parsers in both accuracy and parsing speed. Moreover, our results keep up with the best (general) discontinuous parsers, particularly the scores for discontinuous constitutents are excellent. The heart of our approach is an efficient lexicalization procedure which induces a lexical LCFRS from any discontinuous treebank. It is an adaptation of previous work by Mörbitz and Ruprecht (2020). We also describe a modification to usual chart-based LCFRS parsing that accounts for supertagging and introduce a procedure for the transformation of lexical LCFRS derivations into equivalent parse trees of the original treebank. Our approach is implemented and evaluated on the English Discontinuous Penn Treebank and the German corpora NeGra and Tiger.