CLFeb 28, 2019

Better, Faster, Stronger Sequence Tagging Constituent Parsers

David Vilares, Mostafa Abdou, Anders Søgaard

arXiv:1902.10985v331.11104 citationsHas Code

Originality Incremental advance

AI Analysis

This work improves parsing accuracy and speed for natural language processing tasks, particularly benefiting applications requiring efficient constituent parsing, though it is incremental as it builds on existing sequence tagging methods.

The paper tackled weaknesses in sequence tagging constituent parsers, such as high error rates on long constituents and greedy decoding issues, by introducing techniques like switching tagging schemes and multi-task learning, resulting in surpassing previous parsers on English and Chinese Penn Treebanks and achieving new state-of-the-art on Basque, Hebrew, Polish, and Swedish SPMRL datasets.

Sequence tagging models for constituent parsing are faster, but less accurate than other types of parsers. In this work, we address the following weaknesses of such constituent parsers: (a) high error rates around closing brackets of long constituents, (b) large label sets, leading to sparsity, and (c) error propagation arising from greedy decoding. To effectively close brackets, we train a model that learns to switch between tagging schemes. To reduce sparsity, we decompose the label set and use multi-task learning to jointly learn to predict sublabels. Finally, we mitigate issues from greedy decoding through auxiliary losses and sentence-level fine-tuning with policy gradient. Combining these techniques, we clearly surpass the performance of sequence tagging constituent parsers on the English and Chinese Penn Treebanks, and reduce their parsing time even further. On the SPMRL datasets, we observe even greater improvements across the board, including a new state of the art on Basque, Hebrew, Polish and Swedish.

View on arXiv PDF Code

Similar