CLJun 7, 2021

Apurinã Universal Dependencies Treebank

arXiv:2106.03391v1Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of resource scarcity for endangered languages like Apurinã, providing foundational tools for linguistic research and preservation, though it is incremental as it builds on existing Universal Dependencies frameworks.

The paper tackles the lack of linguistic resources for the Apurinã language by creating the first Universal Dependencies treebank with 76 annotated sentences, 14 parts-of-speech, and seven augmented or new features, while also developing finite-state descriptions and infrastructure for this endangered language.

This paper presents and discusses the first Universal Dependencies treebank for the Apurinã language. The treebank contains 76 fully annotated sentences, applies 14 parts-of-speech, as well as seven augmented or new features - some of which are unique to Apurinã. The construction of the treebank has also served as an opportunity to develop finite-state description of the language and facilitate the transfer of open-source infrastructure possibilities to an endangered language of the Amazon. The source materials used in the initial treebank represent fieldwork practices where not all tokens of all sentences are equally annotated. For this reason, establishing regular annotation practices for the entire Apurinã treebank is an ongoing project.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes