CLJun 21, 2022

Building an Endangered Language Resource in the Classroom: Universal Dependencies for Kakataibo

arXiv:2206.10343v1585 citationsh-index: 13
Originality Synthesis-oriented
AI Analysis

This provides a new linguistic resource for an endangered language, but is incremental as it applies existing methods to new data.

The authors created a Universal Dependencies treebank for Kakataibo, an endangered Amazonian language, using a collaborative classroom methodology, and achieved part-of-speech tagging accuracy of 85.7% and dependency parsing UAS of 72.3% in monolingual experiments.

In this paper, we launch a new Universal Dependencies treebank for an endangered language from Amazonia: Kakataibo, a Panoan language spoken in Peru. We first discuss the collaborative methodology implemented, which proved effective to create a treebank in the context of a Computational Linguistic course for undergraduates. Then, we describe the general details of the treebank and the language-specific considerations implemented for the proposed annotation. We finally conduct some experiments on part-of-speech tagging and syntactic dependency parsing. We focus on monolingual and transfer learning settings, where we study the impact of a Shipibo-Konibo treebank, another Panoan language resource.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes