CLApr 26, 2022

Developing Universal Dependency Treebanks for Magahi and Braj

arXiv:2204.12633v13 citationsh-index: 13Has Code
Originality Synthesis-oriented
AI Analysis

This provides linguistic resources for low-resourced languages, but is incremental as it applies an existing framework to new data.

The authors developed Universal Dependency treebanks for the low-resourced Indian languages Magahi and Braj, containing 945 and 500 annotated sentences respectively, and will release them publicly.

In this paper, we discuss the development of treebanks for two low-resourced Indian languages - Magahi and Braj based on the Universal Dependencies framework. The Magahi treebank contains 945 sentences and Braj treebank around 500 sentences marked with their lemmas, part-of-speech, morphological features and universal dependencies. This paper gives a description of the different dependency relationship found in the two languages and give some statistics of the two treebanks. The dataset will be made publicly available on Universal Dependency (UD) repository (https://github.com/UniversalDependencies/UD_Magahi-MGTB/tree/master) in the next(v2.10) release.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes