CLOct 20, 2021

Discontinuous Grammar as a Foreign Language

Daniel Fernández-González, Carlos Gómez-Rodríguez

arXiv:2110.10431v21.010 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the need for accurate parsing of complex syntactic phenomena in AI systems that process text and speech, representing an incremental improvement over existing methods.

The paper tackled the problem of syntactic constituent parsing for natural language understanding by extending sequence-to-sequence models to handle discontinuous grammatical structures, achieving state-of-the-art scores on the discontinuous English Penn Treebank.

In order to achieve deep natural language understanding, syntactic constituent parsing is a vital step, highly demanded by many artificial intelligence systems to process both text and speech. One of the most recent proposals is the use of standard sequence-to-sequence models to perform constituent parsing as a machine translation task, instead of applying task-specific parsers. While they show a competitive performance, these text-to-parse transducers are still lagging behind classic techniques in terms of accuracy, coverage and speed. To close the gap, we here extend the framework of sequence-to-sequence models for constituent parsing, not only by providing a more powerful neural architecture for improving their performance, but also by enlarging their coverage to handle the most complex syntactic phenomena: discontinuous structures. To that end, we design several novel linearizations that can fully produce discontinuities and, for the first time, we test a sequence-to-sequence model on the main discontinuous benchmarks, obtaining competitive results on par with task-specific discontinuous constituent parsers and achieving state-of-the-art scores on the (discontinuous) English Penn Treebank.

View on arXiv PDF Code

Similar