Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection
This work addresses the need for standardized linguistic annotation across multiple languages, facilitating NLP research and applications, but it is incremental as it builds on previous versions.
The paper introduces Universal Dependencies v2, an updated multilingual treebank collection with cross-linguistically consistent annotation for 90 languages, detailing guideline changes from version 1 and providing an overview of available resources.
Universal Dependencies is an open community effort to create cross-linguistically consistent treebank annotation for many languages within a dependency-based lexicalist framework. The annotation consists in a linguistically motivated word segmentation; a morphological layer comprising lemmas, universal part-of-speech tags, and standardized morphological features; and a syntactic layer focusing on syntactic relations between predicates, arguments and modifiers. In this paper, we describe version 2 of the guidelines (UD v2), discuss the major changes from UD v1 to UD v2, and give an overview of the currently available treebanks for 90 languages.