CLMay 27, 2020

Enriched In-Order Linearization for Faster Sequence-to-Sequence Constituent Parsing

Daniel Fernández-González, Carlos Gómez-Rodríguez

arXiv:2005.13334v131.1998 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses parsing efficiency and accuracy for NLP researchers, presenting an incremental improvement over existing methods.

The paper tackled the problem of sequence-to-sequence constituent parsing by proposing an enriched in-order linearization, achieving the best accuracy to date on the English PTB dataset among fully-supervised single-model parsers and matching state-of-the-art transition-based parsers in speed.

Sequence-to-sequence constituent parsing requires a linearization to represent trees as sequences. Top-down tree linearizations, which can be based on brackets or shift-reduce actions, have achieved the best accuracy to date. In this paper, we show that these results can be improved by using an in-order linearization instead. Based on this observation, we implement an enriched in-order shift-reduce linearization inspired by Vinyals et al. (2015)'s approach, achieving the best accuracy to date on the English PTB dataset among fully-supervised single-model sequence-to-sequence constituent parsers. Finally, we apply deterministic attention mechanisms to match the speed of state-of-the-art transition-based parsers, thus showing that sequence-to-sequence models can match them, not only in accuracy, but also in speed.

View on arXiv PDF Code

Similar