CLDec 2, 2020

Supertagging the Long Tail with Tree-Structured Decoding of Complex Categories

Jakob Prange, Nathan Schneider, Vivek Srikumar

arXiv:2012.01285v226.7651 citationsHas Code

Originality Highly original

AI Analysis

This work is significant for researchers and developers working with CCG parsing, as it improves the ability to handle complex and rare syntactic categories, leading to more robust and comprehensive parsing results.

This paper addresses the challenge of supertagging rare and complex CCG categories, which are often discarded due to their infrequent occurrence. The authors propose constructive models that leverage the internal tree structure of supertags, enabling their system to recover a significant portion of long-tail supertags and even generate novel categories, while maintaining state-of-the-art overall tag accuracy with fewer parameters.

Although current CCG supertaggers achieve high accuracy on the standard WSJ test set, few systems make use of the categories' internal structure that will drive the syntactic derivation during parsing. The tagset is traditionally truncated, discarding the many rare and complex category types in the long tail. However, supertags are themselves trees. Rather than give up on rare tags, we investigate constructive models that account for their internal structure, including novel methods for tree-structured prediction. Our best tagger is capable of recovering a sizeable fraction of the long-tail supertags and even generates CCG categories that have never been seen in training, while approximating the prior state of the art in overall tag accuracy with fewer parameters. We further investigate how well different approaches generalize to out-of-domain evaluation sets.

View on arXiv PDF Code

Similar