CLFeb 24, 2019

Unlexicalized Transition-based Discontinuous Constituency Parsing

Maximin Coavoux, Benoît Crabbé, Shay B. Cohen

arXiv:1902.08912v131.11100 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the parsing of discontinuous structures in natural language processing, providing empirical evidence against the necessity of lexicalization, which is incremental as it challenges existing assumptions but builds on prior parsing methods.

The paper tackled the problem of whether lexicalization is necessary for discontinuous constituency parsing by introducing an unlexicalized transition-based parser. The result showed that unlexicalized models systematically outperformed lexicalized ones, achieving new state-of-the-art results on English and German treebanks.

Lexicalized parsing models are based on the assumptions that (i) constituents are organized around a lexical head (ii) bilexical statistics are crucial to solve ambiguities. In this paper, we introduce an unlexicalized transition-based parser for discontinuous constituency structures, based on a structure-label transition system and a bi-LSTM scoring system. We compare it to lexicalized parsing models in order to address the question of lexicalization in the context of discontinuous constituency parsing. Our experiments show that unlexicalized models systematically achieve higher results than lexicalized models, and provide additional empirical evidence that lexicalization is not necessary to achieve strong parsing results. Our best unlexicalized model sets a new state of the art on English and German discontinuous constituency treebanks. We further provide a per-phenomenon analysis of its errors on discontinuous constituents.

View on arXiv PDF Code

Similar