UCxn: Typologically Informed Annotation of Constructions Atop Universal Dependencies
This work addresses a gap in linguistic annotation for researchers and practitioners using UD, enabling better cross-linguistic comparison of constructions, though it is incremental as it builds on existing UD frameworks.
The authors tackled the problem that Universal Dependencies (UD) annotations lack holistic labels for meaning-bearing grammatical constructions, such as interrogative sentences, by proposing a 'UCxn' annotation layer and applying it in a typologically informed way to five construction families across ten languages, resulting in insights into methodology and foundational steps for enriching UD treebanks.
The Universal Dependencies (UD) project has created an invaluable collection of treebanks with contributions in over 140 languages. However, the UD annotations do not tell the full story. Grammatical constructions that convey meaning through a particular combination of several morphosyntactic elements -- for example, interrogative sentences with special markers and/or word orders -- are not labeled holistically. We argue for (i) augmenting UD annotations with a 'UCxn' annotation layer for such meaning-bearing grammatical constructions, and (ii) approaching this in a typologically informed way so that morphosyntactic strategies can be compared across languages. As a case study, we consider five construction families in ten languages, identifying instances of each construction in UD treebanks through the use of morphosyntactic patterns. In addition to findings regarding these particular constructions, our study yields important insights on methodology for describing and identifying constructions in language-general and language-particular ways, and lays the foundation for future constructional enrichment of UD treebanks.